TAAFT
Free mode
100% free
Freemium
Free Trial
Deals

MMLU-Pro

A harder, more reasoning-focused replacement for MMLU. 10 answer choices instead of 4 and curated to remove trivially answerable items.

Knowledge Text Accuracy Max 100.0% Released Jun 2024
23
Results
22
Models scored
90.0%
Top: GPT OSS 120B
79.9%
Median

Best results

Top primary scores; one row per model.

Frontier over time

Each dot is one model result; the line traces the running best score.
Best score over time0.0025.050.075.0100.0Oct 2024Jul 2025Apr 2026

All results

Showing all configurations including non-primary alternates.  · Show only primary
# Model Score Conditions Eval date Source Flags
1 GPT OSS 120B 90.0% CoT 05 Aug 2025 Self-reported Primary
2 Qwen 3.7 Max 89.6% 0-shot · CoT · standard 20 May 2026 Self-reported
3 Qwen 3.5 27B 86.1% 24 Feb 2026 Third-party Primary Verified
4 Gemma 4 85.2% CoT 03 Apr 2026 Self-reported Primary
5 DeepSeek V3.2 Exp 85.0% CoT 29 Sep 2025 Self-reported Primary
6 Deepseek 3.2 85.0% 01 Dec 2025 Paper Primary
7 DeepSeek V3.1 Terminus 85.0% 22 Sep 2025 Self-reported Primary
8 DeepSeek-R1 84.0% CoT 21 Jan 2025 Paper Primary
9 Llama 4 Behemoth 82.2% 05 Apr 2025 Self-reported Primary
10 Llama 4 Maverick 80.5% 05 Apr 2025 Self-reported Primary
11 Seed 1.5 80.1% 0-shot · CoT 22 Jan 2025 Self-reported Primary
12 Grok 3 79.9% 19 Feb 2025 Self-reported Primary
13 Grok 3 79.9% 19 Feb 2025 Self-reported Primary
14 Grok 3 mini 78.9% 19 Feb 2025 Self-reported Primary
15 Nemotron 3 Nano 78.3% 15 Dec 2025 Self-reported Primary
16 Gemma 3 78.0% 20 May 2025 Self-reported Primary
17 Gemini 2.0 Flash 77.6% 0-shot · standard 05 Feb 2025 Self-reported
18 Phi 4 reasoning plus 76.0% 08 Jul 2025 Self-reported Primary
19 DeepSeek V3 75.9% 26 Dec 2024 Paper Primary
20 Nemotron 3 Super 75.7% 5-shot · CoT 03 Apr 2026 Self-reported Primary
21 Claude Sonnet 3.5 75.1% 0-shot · CoT · standard 22 Oct 2024 Self-reported
22 Llama 4 Scout 74.3% 05 Apr 2025 Self-reported Primary
23 Command A 69.6% 07 Apr 2025 Paper Primary
24 Llama 3.3 68.9% 5-shot · CoT 06 Dec 2024 Self-reported Primary
25 Mistral Small 3 66.3% 5-shot · CoT 30 Jan 2025 Self-reported Primary
26 Claude Haiku 3 49.0% 0-shot · CoT · standard 22 Oct 2024 Self-reported
27 Claude Haiku 3.5 41.6% 0-shot · CoT 22 Oct 2024 Self-reported Primary
0 AIs selected
Clear selection
#
Name
Task