MMLU
Multiple-choice questions across 57 academic subjects (humanities, STEM, social sciences, professional). Standard 5-shot accuracy. Largely saturated by frontier models.
Best results
Frontier over time
All results
| # | Model | Score | Conditions | Eval date | Source | Flags |
|---|---|---|---|---|---|---|
| 1 | Claude Sonnet 3.5 | 90.4% | 0-shot · standard | 20 Jun 2024 | Self-reported | |
| 2 | GPT 4.1 | 90.2% | — | 14 Apr 2025 | Self-reported | Primary |
| 3 | Gemini Ultra | 90.0% | 0-shot · CoT · standard | 06 Dec 2023 | Self-reported | |
| 4 | GPT-4o | 88.7% | — | 16 Apr 2025 | Self-reported | Primary |
| 5 | Seed 1.5 | 88.6% | — | 22 Jan 2025 | Self-reported | Primary |
| 6 | Nova Premier | 87.4% | — | 30 Apr 2025 | Self-reported | Primary |
| 7 | Claude Opus 3 | 86.8% | — | 04 Mar 2024 | Self-reported | Primary |
| 8 | GPT-4 | 86.4% | 5-shot | 01 Jan 2024 | Paper | Primary |
| 9 | Nemotron 3 Super | 86.0% | 5-shot | 03 Apr 2026 | Self-reported | Primary |
| 10 | Llama 3.3 | 86.0% | 0-shot · CoT | 06 Dec 2024 | Self-reported | Primary |
| 11 | Nova Pro | 85.9% | 0-shot · CoT | 03 Dec 2024 | Self-reported | Primary |
| 12 | Gemini 1.5 | 85.9% | 5-shot · standard | 01 May 2024 | Self-reported | |
| 13 | Command A | 85.5% | — | 07 Apr 2025 | Self-reported | Primary |
| 14 | Mistral Large | 81.2% | 5-shot | 26 Feb 2024 | Self-reported | |
| 15 | Nova Lite | 80.5% | 0-shot · CoT | 03 Dec 2024 | Self-reported | Primary |
| 16 | Gemini 1.5 Flash | 78.9% | 5-shot · standard | 01 May 2024 | Self-reported | |
| 17 | Claude 2 | 78.5% | 5-shot · CoT · standard | 11 Jul 2023 | Self-reported | |
| 18 | Nova Micro | 77.6% | 0-shot · CoT | 03 Dec 2024 | Self-reported | Primary |
| 19 | Command R Plus | 75.7% | — | 04 Apr 2024 | Self-reported | Primary |
| 20 | Claude Haiku 3 | 75.2% | 5-shot · standard | 04 Mar 2024 | Self-reported | |
| 21 | DBRX Instruct | 73.7% | 5-shot | 27 Mar 2024 | Self-reported | Primary |
| 22 | Mixtral 8x7B | 70.6% | — | 01 Dec 2023 | Paper | Primary |
| 23 | Mixtral 8x22B | 70.6% | — | 08 Jan 2024 | Paper | Primary |
| 24 | Mixtral 8x7B | 70.6% | 5-shot | 08 Jan 2024 | Paper | |
| 25 | GPT 3.5 | 70.0% | 5-shot · standard | 14 Mar 2023 | Self-reported | |
| 26 | Pixtral 12B | 69.2% | 5-shot | 10 Oct 2024 | Self-reported | Primary |
| 27 | LLaMA 2 | 68.9% | 5-shot | 19 Jul 2023 | Paper | Primary Verified |
| 28 | LLaMA 2 70B | 68.9% | 5-shot | 11 Jul 2023 | Paper | |
| 29 | Mistral NeMo | 68.0% | 5-shot | 18 Jul 2024 | Self-reported | Primary |
| 30 | Llama 3.2 | 63.4% | — | 25 Sep 2024 | Self-reported | Primary |
| 31 | Mistral 7B | 60.1% | — | 01 Sep 2023 | Paper | Primary |
| 32 | Mistral 7B | 60.1% | 5-shot | 10 Oct 2023 | Paper | |
| 33 | Gemma 2 | 51.3% | 5-shot | 25 Feb 2025 | Self-reported | Primary |
