TAAFT
Free mode
100% free
Freemium
Free Trial
Deals

MMLU

Massive Multitask Language Understanding

Multiple-choice questions across 57 academic subjects (humanities, STEM, social sciences, professional). Standard 5-shot accuracy. Largely saturated by frontier models.

Knowledge Text Accuracy Max 100.0% Released Sep 2020 Saturated Possibly contaminated
22
Results
22
Models scored
90.2%
Top: GPT 4.1
79.1%
Median

Best results

Top primary scores; one row per model.
1
90.2%
2
88.7%
3
88.6%
6
86.4%
8
86.0%
9
85.9%
10
85.5%

Frontier over time

Each dot is one model result; the line traces the running best score.
Best score over time0.0025.050.075.0100.0Jul 2023Nov 2024Apr 2026

All results

Showing all configurations including non-primary alternates.  · Show only primary
# Model Score Conditions Eval date Source Flags
1 Claude Sonnet 3.5 90.4% 0-shot · standard 20 Jun 2024 Self-reported
2 GPT 4.1 90.2% 14 Apr 2025 Self-reported Primary
3 Gemini Ultra 90.0% 0-shot · CoT · standard 06 Dec 2023 Self-reported
4 GPT-4o 88.7% 16 Apr 2025 Self-reported Primary
5 Seed 1.5 88.6% 22 Jan 2025 Self-reported Primary
6 Nova Premier 87.4% 30 Apr 2025 Self-reported Primary
7 Claude Opus 3 86.8% 04 Mar 2024 Self-reported Primary
8 GPT-4 86.4% 5-shot 01 Jan 2024 Paper Primary
9 Nemotron 3 Super 86.0% 5-shot 03 Apr 2026 Self-reported Primary
10 Llama 3.3 86.0% 0-shot · CoT 06 Dec 2024 Self-reported Primary
11 Nova Pro 85.9% 0-shot · CoT 03 Dec 2024 Self-reported Primary
12 Gemini 1.5 85.9% 5-shot · standard 01 May 2024 Self-reported
13 Command A 85.5% 07 Apr 2025 Self-reported Primary
14 Mistral Large 81.2% 5-shot 26 Feb 2024 Self-reported
15 Nova Lite 80.5% 0-shot · CoT 03 Dec 2024 Self-reported Primary
16 Gemini 1.5 Flash 78.9% 5-shot · standard 01 May 2024 Self-reported
17 Claude 2 78.5% 5-shot · CoT · standard 11 Jul 2023 Self-reported
18 Nova Micro 77.6% 0-shot · CoT 03 Dec 2024 Self-reported Primary
19 Command R Plus 75.7% 04 Apr 2024 Self-reported Primary
20 Claude Haiku 3 75.2% 5-shot · standard 04 Mar 2024 Self-reported
21 DBRX Instruct 73.7% 5-shot 27 Mar 2024 Self-reported Primary
22 Mixtral 8x7B 70.6% 01 Dec 2023 Paper Primary
23 Mixtral 8x22B 70.6% 08 Jan 2024 Paper Primary
24 Mixtral 8x7B 70.6% 5-shot 08 Jan 2024 Paper
25 GPT 3.5 70.0% 5-shot · standard 14 Mar 2023 Self-reported
26 Pixtral 12B 69.2% 5-shot 10 Oct 2024 Self-reported Primary
27 LLaMA 2 68.9% 5-shot 19 Jul 2023 Paper Primary Verified
28 LLaMA 2 70B 68.9% 5-shot 11 Jul 2023 Paper
29 Mistral NeMo 68.0% 5-shot 18 Jul 2024 Self-reported Primary
30 Llama 3.2 63.4% 25 Sep 2024 Self-reported Primary
31 Mistral 7B 60.1% 01 Sep 2023 Paper Primary
32 Mistral 7B 60.1% 5-shot 10 Oct 2023 Paper
33 Gemma 2 51.3% 5-shot 25 Feb 2025 Self-reported Primary
0 AIs selected
Clear selection
#
Name
Task