MMLU-Pro
A harder, more reasoning-focused replacement for MMLU. 10 answer choices instead of 4 and curated to remove trivially answerable items.
Best results
Frontier over time
All results
| # | Model | Score | Conditions | Eval date | Source | Flags |
|---|---|---|---|---|---|---|
| 1 | GPT OSS 120B | 90.0% | CoT | 05 Aug 2025 | Self-reported | Primary |
| 2 | Qwen 3.7 Max | 89.6% | 0-shot · CoT · standard | 20 May 2026 | Self-reported | |
| 3 | Qwen 3.5 27B | 86.1% | — | 24 Feb 2026 | Third-party | Primary Verified |
| 4 | Gemma 4 | 85.2% | CoT | 03 Apr 2026 | Self-reported | Primary |
| 5 | DeepSeek V3.2 Exp | 85.0% | CoT | 29 Sep 2025 | Self-reported | Primary |
| 6 | Deepseek 3.2 | 85.0% | — | 01 Dec 2025 | Paper | Primary |
| 7 | DeepSeek V3.1 Terminus | 85.0% | — | 22 Sep 2025 | Self-reported | Primary |
| 8 | DeepSeek-R1 | 84.0% | CoT | 21 Jan 2025 | Paper | Primary |
| 9 | Llama 4 Behemoth | 82.2% | — | 05 Apr 2025 | Self-reported | Primary |
| 10 | Llama 4 Maverick | 80.5% | — | 05 Apr 2025 | Self-reported | Primary |
| 11 | Seed 1.5 | 80.1% | 0-shot · CoT | 22 Jan 2025 | Self-reported | Primary |
| 12 | Grok 3 | 79.9% | — | 19 Feb 2025 | Self-reported | Primary |
| 13 | Grok 3 | 79.9% | — | 19 Feb 2025 | Self-reported | Primary |
| 14 | Grok 3 mini | 78.9% | — | 19 Feb 2025 | Self-reported | Primary |
| 15 | Nemotron 3 Nano | 78.3% | — | 15 Dec 2025 | Self-reported | Primary |
| 16 | Gemma 3 | 78.0% | — | 20 May 2025 | Self-reported | Primary |
| 17 | Gemini 2.0 Flash | 77.6% | 0-shot · standard | 05 Feb 2025 | Self-reported | |
| 18 | Phi 4 reasoning plus | 76.0% | — | 08 Jul 2025 | Self-reported | Primary |
| 19 | DeepSeek V3 | 75.9% | — | 26 Dec 2024 | Paper | Primary |
| 20 | Nemotron 3 Super | 75.7% | 5-shot · CoT | 03 Apr 2026 | Self-reported | Primary |
| 21 | Claude Sonnet 3.5 | 75.1% | 0-shot · CoT · standard | 22 Oct 2024 | Self-reported | |
| 22 | Llama 4 Scout | 74.3% | — | 05 Apr 2025 | Self-reported | Primary |
| 23 | Command A | 69.6% | — | 07 Apr 2025 | Paper | Primary |
| 24 | Llama 3.3 | 68.9% | 5-shot · CoT | 06 Dec 2024 | Self-reported | Primary |
| 25 | Mistral Small 3 | 66.3% | 5-shot · CoT | 30 Jan 2025 | Self-reported | Primary |
| 26 | Claude Haiku 3 | 49.0% | 0-shot · CoT · standard | 22 Oct 2024 | Self-reported | |
| 27 | Claude Haiku 3.5 | 41.6% | 0-shot · CoT | 22 Oct 2024 | Self-reported | Primary |
MongoDB - Build AI That Scales
