Llama 4: Meta Bets on Open Weights and Mixture of Experts

Overview

On April 5, 2025 — a Saturday, the unusual timing noted widely as deliberate — Meta released Llama 4, introducing its first models using Mixture-of-Experts (MoE) architecture. The release marked a significant architectural shift from previous Llama generations and established Meta as a serious competitor to proprietary frontier models.

Three models were announced:

Scout: 17B active parameters / 109B total; 10 million token context window
Maverick: 17B active parameters / 400B total; 1 million token context window
Behemoth: 288B active parameters / ~2 trillion total — announced but still in training at release

All released under Meta’s custom open weights license, allowing commercial use.

The Mixture-of-Experts Architecture

The shift to MoE was Llama 4’s defining technical decision:

In a dense model (all previous Llama generations), every parameter is activated for every input token. In a MoE model, the network contains many specialized “expert” sub-networks, but only a small subset — the active parameters — are activated per token. The rest remain dormant.

Practical consequences:

Inference efficiency: A model with 400B total parameters but only 17B active per token is as fast to run as a 17B dense model
Capacity without cost: The model can store far more specialized knowledge (400B total) than it expends compute to use (17B per call)
Specialization: Different experts can develop competency in different domains (code vs. natural language vs. science)

This is the same architectural choice that powers GPT-4 (reportedly), Gemini (partially), and DeepSeek-V3.

Scale and Data

Trained on 30+ trillion tokens — double the training data of Llama 3
Natively multimodal: trained jointly on text, image, and video from the start, unlike previous Llama versions where multimodality was added as an adapter
Scout’s 10 million token context window was the largest available in an openly released model — sufficient to process an entire codebase, book, or dataset in a single prompt

The Behemoth Claim

Meta’s announcement included a controversial performance comparison for Llama 4 Behemoth (still in training):

“Llama 4 Behemoth outperforms GPT-4.5, Claude 3.7 Sonnet, and Gemini 2.0 Pro on STEM benchmarks.”

The claim was immediately disputed, as Behemoth was not publicly available for independent verification — representing one of several controversies around AI benchmark reporting in 2025.

Open Weights Strategy

Meta’s commitment to open weights AI was, by 2025, a deliberate strategic and philosophical position articulated by Mark Zuckerberg:

Open weights models could be downloaded, modified, and deployed by any organization without API dependency
Llama models had become the foundation of the majority of open-source AI applications — from code assistants to enterprise fine-tuned models to research systems
Meta’s economic rationale: open weights drive adoption of Meta’s cloud and hardware; proprietary AI stacks favor competitors

Llama 4 Scout and Maverick were available via Meta’s own Llama API, the AI Foundry (enterprise), and for self-hosting.