All Events
model-release
☆ SHIJIA

Llama 4: Meta Bets on Open Weights and Mixture of Experts

Overview On April 5, 2025 — a Saturday, the unusual timing noted widely as deliberate — Meta released Llama 4, introducing its first models using Mixture-of-Experts (MoE) architecture. The release marked a significant architectural shift …

2025-04-05

Overview

On April 5, 2025 — a Saturday, the unusual timing noted widely as deliberate — Meta released Llama 4, introducing its first models using Mixture-of-Experts (MoE) architecture. The release marked a significant architectural shift from previous Llama generations and established Meta as a serious competitor to proprietary frontier models.

Three models were announced:

  • Scout: 17B active parameters / 109B total; 10 million token context window
  • Maverick: 17B active parameters / 400B total; 1 million token context window
  • Behemoth: 288B active parameters / ~2 trillion total — announced but still in training at release

All released under Meta’s custom open weights license, allowing commercial use.

The Mixture-of-Experts Architecture

The shift to MoE was Llama 4’s defining technical decision:

In a dense model (all previous Llama generations), every parameter is activated for every input token. In a MoE model, the network contains many specialized “expert” sub-networks, but only a small subset — the active parameters — are activated per token. The rest remain dormant.

Practical consequences:

  • Inference efficiency: A model with 400B total parameters but only 17B active per token is as fast to run as a 17B dense model
  • Capacity without cost: The model can store far more specialized knowledge (400B total) than it expends compute to use (17B per call)
  • Specialization: Different experts can develop competency in different domains (code vs. natural language vs. science)

This is the same architectural choice that powers GPT-4 (reportedly), Gemini (partially), and DeepSeek-V3.

Scale and Data

  • Trained on 30+ trillion tokens — double the training data of Llama 3
  • Natively multimodal: trained jointly on text, image, and video from the start, unlike previous Llama versions where multimodality was added as an adapter
  • Scout’s 10 million token context window was the largest available in an openly released model — sufficient to process an entire codebase, book, or dataset in a single prompt

The Behemoth Claim

Meta’s announcement included a controversial performance comparison for Llama 4 Behemoth (still in training):

“Llama 4 Behemoth outperforms GPT-4.5, Claude 3.7 Sonnet, and Gemini 2.0 Pro on STEM benchmarks.”

The claim was immediately disputed, as Behemoth was not publicly available for independent verification — representing one of several controversies around AI benchmark reporting in 2025.

Open Weights Strategy

Meta’s commitment to open weights AI was, by 2025, a deliberate strategic and philosophical position articulated by Mark Zuckerberg:

  • Open weights models could be downloaded, modified, and deployed by any organization without API dependency
  • Llama models had become the foundation of the majority of open-source AI applications — from code assistants to enterprise fine-tuned models to research systems
  • Meta’s economic rationale: open weights drive adoption of Meta’s cloud and hardware; proprietary AI stacks favor competitors

Llama 4 Scout and Maverick were available via Meta’s own Llama API, the AI Foundry (enterprise), and for self-hosting.

References