MMLU

Massive Multitask Language Understanding

Multiple-choice questions across 57 academic subjects (humanities, STEM, social sciences, professional). Standard 5-shot accuracy. Largely saturated by frontier models.

Knowledge Text Accuracy Max 100.0% Released Sep 2020 Saturated Possibly contaminated

Homepage Paper Code

Results

Models scored

90.2%

Top: GPT 4.1

79.1%

Median

Best results

Top primary scores; one row per model.

90.2%

88.7%

88.6%

87.4%

86.8%

86.4%

86.0%

86.0%

85.9%

85.5%

Frontier over time

Each dot is one model result; the line traces the running best score.

All results

Showing all configurations including non-primary alternates. · Show only primary

#	Model	Score	Conditions	Eval date	Source	Flags
1	Claude Sonnet 3.5	90.4%	0-shot · standard	20 Jun 2024	Self-reported
2	GPT 4.1	90.2%	—	14 Apr 2025	Self-reported	Primary
3	Gemini Ultra	90.0%	0-shot · CoT · standard	06 Dec 2023	Self-reported
4	GPT-4o	88.7%	—	16 Apr 2025	Self-reported	Primary
5	Seed 1.5	88.6%	—	22 Jan 2025	Self-reported	Primary
6	Nova Premier	87.4%	—	30 Apr 2025	Self-reported	Primary
7	Claude Opus 3	86.8%	—	04 Mar 2024	Self-reported	Primary
8	GPT-4	86.4%	5-shot	01 Jan 2024	Paper	Primary
9	Nemotron 3 Super	86.0%	5-shot	03 Apr 2026	Self-reported	Primary
10	Llama 3.3	86.0%	0-shot · CoT	06 Dec 2024	Self-reported	Primary
11	Nova Pro	85.9%	0-shot · CoT	03 Dec 2024	Self-reported	Primary
12	Gemini 1.5	85.9%	5-shot · standard	01 May 2024	Self-reported
13	Command A	85.5%	—	07 Apr 2025	Self-reported	Primary
14	Mistral Large	81.2%	5-shot	26 Feb 2024	Self-reported
15	Nova Lite	80.5%	0-shot · CoT	03 Dec 2024	Self-reported	Primary
16	Gemini 1.5 Flash	78.9%	5-shot · standard	01 May 2024	Self-reported
17	Claude 2	78.5%	5-shot · CoT · standard	11 Jul 2023	Self-reported
18	Nova Micro	77.6%	0-shot · CoT	03 Dec 2024	Self-reported	Primary
19	Command R Plus	75.7%	—	04 Apr 2024	Self-reported	Primary
20	Claude Haiku 3	75.2%	5-shot · standard	04 Mar 2024	Self-reported
21	DBRX Instruct	73.7%	5-shot	27 Mar 2024	Self-reported	Primary
22	Mixtral 8x7B	70.6%	—	01 Dec 2023	Paper	Primary
23	Mixtral 8x22B	70.6%	—	08 Jan 2024	Paper	Primary
24	Mixtral 8x7B	70.6%	5-shot	08 Jan 2024	Paper
25	GPT 3.5	70.0%	5-shot · standard	14 Mar 2023	Self-reported
26	Pixtral 12B	69.2%	5-shot	10 Oct 2024	Self-reported	Primary
27	LLaMA 2	68.9%	5-shot	19 Jul 2023	Paper	Primary Verified
28	LLaMA 2 70B	68.9%	5-shot	11 Jul 2023	Paper
29	Mistral NeMo	68.0%	5-shot	18 Jul 2024	Self-reported	Primary
30	Llama 3.2	63.4%	—	25 Sep 2024	Self-reported	Primary
31	Mistral 7B	60.1%	—	01 Sep 2023	Paper	Primary
32	Mistral 7B	60.1%	5-shot	10 Oct 2023	Paper
33	Gemma 2	51.3%	5-shot	25 Feb 2025	Self-reported	Primary

Go to section

Search

MMLU

Best results

Frontier over time

All results

Help

People also viewed

Create AI Tools

Mini Tool

Vibe code an AI Tool

Choose listing type: