It was only a couple of days ago that Meta dropped Llama 4 Maverick (402B total, 17B active) and Scout. Plenty of people are independently testing it. According to Artificial Analysis, Maverick sits ahead of Claude 3.7 Sonnet but behind DeepSeek’s recent V3 0324. Scout is ahead of Claude 3.5 Sonnet and Mistral Small 3.1.
Compared to DeepSeek V3, Maverick has half the active parameters, so it is more efficient. Maverick supports a 1M token context window, while Scout has 10M token context window. It has a median price $0.24/$0.77 per million input/output tokens for Maverick.
Llama 4 independent evals: Maverick (402B total, 17B active) beats Claude 3.7 Sonnet, trails DeepSeek V3 but more efficient; Scout (109B total, 17B active) in-line with GPT-4o mini, ahead of Mistral Small 3.1
We have independently benchmarked Scout and Maverick as scoring 36 and… pic.twitter.com/wwvXaTozeT
— Artificial Analysis (@ArtificialAnlys) April 6, 2025
As it turns out, it is pretty impressive at coding. You can test it out here.