MMMU
11.5k college-level questions across 30 subjects requiring image + text reasoning (charts, diagrams, medical scans, music notation, …).
Best results
Frontier over time
All results
| # | Model | Score | Conditions | Eval date | Source | Flags |
|---|---|---|---|---|---|---|
| 1 | GPT 5.1 | 84.2% | 0-shot · CoT | 13 Nov 2025 | Self-reported | Primary |
| 2 | GPT 5 (Thinking) | 84.2% | — | 07 Aug 2025 | Self-reported | Primary |
| 3 | o3 | 82.9% | — | 16 Apr 2025 | Self-reported | Primary |
| 4 | o4 mini | 81.6% | — | 16 Apr 2025 | Self-reported | Primary |
| 5 | Claude Opus 4.5 | 80.7% | — | 24 Nov 2025 | Self-reported | Primary |
| 6 | Claude Sonnet 4.5 | 77.8% | — | 29 Sep 2025 | Self-reported | Primary |
| 7 | o1 | 77.6% | — | 16 Apr 2025 | Self-reported | Primary |
| 8 | Qwen 3.5 122B A10B | 76.9% | — | 24 Apr 2026 | Third-party | Primary Verified |
| 9 | Llama 4 Behemoth | 76.1% | — | 05 Apr 2025 | Self-reported | Primary |
| 10 | GPT 4.1 | 75.0% | — | 14 Apr 2025 | Self-reported | Primary |
| 11 | Claude Sonnet 3.7 (Thinking) | 75.0% | — | 24 Feb 2025 | Self-reported | Primary |
| 12 | Claude Sonnet 4 | 74.4% | — | 22 May 2025 | Self-reported | Primary |
| 13 | GPT 5 | 74.4% | — | 07 Aug 2025 | Self-reported | Primary |
| 14 | Seed 1.5 | 73.9% | — | 22 Jan 2025 | Self-reported | Primary |
| 15 | Llama 4 Maverick | 73.4% | — | 05 Apr 2025 | Self-reported | Primary |
| 16 | Claude Haiku 4.5 | 73.2% | — | 15 Oct 2025 | Self-reported | Primary |
| 17 | Grok 3 | 73.2% | — | 19 Feb 2025 | Self-reported | Primary |
| 18 | Claude Haiku 4.5 | 73.2% | — | 15 Oct 2025 | Self-reported | Primary |
| 19 | Grok 3 | 73.2% | — | 19 Feb 2025 | Self-reported | Primary |
| 20 | Gemini 2.5 Flash-Lite | 72.9% | — | 26 Sep 2025 | Self-reported | Primary |
| 21 | Claude Sonnet 3.7 | 71.8% | — | 24 Feb 2025 | Self-reported | Primary |
| 22 | Llama 4 Scout | 69.4% | — | 05 Apr 2025 | Self-reported | Primary |
| 23 | Grok 3 mini | 69.4% | — | 19 Feb 2025 | Self-reported | Primary |
| 24 | GPT-4o | 69.1% | — | 16 Apr 2025 | Self-reported | Primary |
| 25 | Claude Sonnet 3.5 | 68.3% | 0-shot · standard | 20 Jun 2024 | Self-reported | |
| 26 | Pixtral Large | 64.0% | CoT | 18 Nov 2024 | Self-reported | Primary |
| 27 | Gemini 1.5 | 62.2% | 0-shot · standard | 01 May 2024 | Self-reported | |
| 28 | Gemini Ultra | 59.4% | 0-shot · standard | 06 Dec 2023 | Self-reported | |
| 29 | Pixtral 12B | 52.0% | CoT | 10 Oct 2024 | Self-reported | Primary |
| 30 | Claude Haiku 3 | 50.2% | 0-shot · standard | 04 Mar 2024 | Self-reported |
MongoDB - Build AI That Scales
