12
The Llama 4 herd
(ai.meta.com)
submitted
21 hours ago* (last edited 21 hours ago)
by
cyrano@lemmy.dbzer0.com
to
c/technology@lemmy.world
Llama 4 Models:
- Both Llama 4 Scout and Llama 4 Maverick use a Mixture-of-Experts (MoE) design with 17B active parameters each.
- They are natively multimodal: text + image input, text-only output.
- Key achievements include industry-leading context lengths, strong coding/reasoning performance, and improved multilingual capabilities.
- Knowledge cutoff: August 2024.
Llama 4 Scout:
- 17B active parameters, 16 experts, 109B total.
- Fits on a single H100 GPU (INT4-quantized).
- 10M token context window
- Outperforms previous Llama releases on multimodal tasks while being more resource-friendly.
- Employs iRoPE architecture for efficient long-context attention.
- Tested with up to 8 images per prompt.
Llama 4 Maverick:
- 17B active parameters, 128 experts, 400B total.
- 1M token context window.
- Not single-GPU; runs on one H100 DGX host or can be distributed for greater efficiency.
- Outperforms GPT-4o and Gemini 2.0 Flash on coding, reasoning, and multilingual tests at a competitive cost.
- Maintains strong image understanding and grounded reasoning ability.
Llama 4 Behemoth (Preview):
- 288B active parameters, 16 experts, nearly 2T total.
- Still in training; not yet released.
- Exceeds GPT-4.5, Claude Sonnet 3.7, and Gemini 2.0 Pro on STEM benchmarks (e.g., MATH-500, GPQA Diamond).
- Serves as the “teacher” model for Scout and Maverick via co-distillation.
Misc:
- MoE Architecture: Only 17B parameters activated per token, reducing inference cost.
- Native Multimodality: Unified text + vision encoder, pre-trained on large-scale unlabeled data.
Jason Bourne Electric Boogaloo