Overview
On April 5, 2025 — a Saturday, the unusual timing noted widely as deliberate — Meta released Llama 4, introducing its first models using Mixture-of-Experts (MoE) architecture. The release marked a significant architectural shift from previous Llama generations and established Meta as a serious competitor to proprietary frontier models.
Three models were announced:
- Scout: 17B active parameters / 109B total; 10 million token context window
- Maverick: 17B active parameters / 400B total; 1 million token context window
- Behemoth: 288B active parameters / ~2 trillion total — announced but still in training at release
All released under Meta’s custom open weights license, allowing commercial use.
The Mixture-of-Experts Architecture
The shift to MoE was Llama 4’s defining technical decision:
In a dense model (all previous Llama generations), every parameter is activated for every input token. In a MoE model, the network contains many specialized “expert” sub-networks, but only a small subset — the active parameters — are activated per token. The rest remain dormant.
Practical consequences:
- Inference efficiency: A model with 400B total parameters but only 17B active per token is as fast to run as a 17B dense model
- Capacity without cost: The model can store far more specialized knowledge (400B total) than it expends compute to use (17B per call)
- Specialization: Different experts can develop competency in different domains (code vs. natural language vs. science)
This is the same architectural choice that powers GPT-4 (reportedly), Gemini (partially), and DeepSeek-V3.
Scale and Data
- Trained on 30+ trillion tokens — double the training data of Llama 3
- Natively multimodal: trained jointly on text, image, and video from the start, unlike previous Llama versions where multimodality was added as an adapter
- Scout’s 10 million token context window was the largest available in an openly released model — sufficient to process an entire codebase, book, or dataset in a single prompt
The Behemoth Claim
Meta’s announcement included a controversial performance comparison for Llama 4 Behemoth (still in training):
“Llama 4 Behemoth outperforms GPT-4.5, Claude 3.7 Sonnet, and Gemini 2.0 Pro on STEM benchmarks.”
The claim was immediately disputed, as Behemoth was not publicly available for independent verification — representing one of several controversies around AI benchmark reporting in 2025.
Open Weights Strategy
Meta’s commitment to open weights AI was, by 2025, a deliberate strategic and philosophical position articulated by Mark Zuckerberg:
- Open weights models could be downloaded, modified, and deployed by any organization without API dependency
- Llama models had become the foundation of the majority of open-source AI applications — from code assistants to enterprise fine-tuned models to research systems
- Meta’s economic rationale: open weights drive adoption of Meta’s cloud and hardware; proprietary AI stacks favor competitors
Llama 4 Scout and Maverick were available via Meta’s own Llama API, the AI Foundry (enterprise), and for self-hosting.