MATH

MATH (Hendrycks)

12.5k competition mathematics problems (AMC, AIME, USAMO style). Reported as overall % or split by Level 1-5 difficulty. The "easy" levels are now saturated; Level 5 still discriminates.

Math Text Accuracy Max 100.0% Released Mar 2021 Saturated Possibly contaminated

Homepage Paper Code

Results

Models scored

88.6%

Top: Seed 1.5

69.2%

Median

Best results

Top primary scores; one row per model.

88.6%

84.8%

80.0%

77.0%

76.6%

76.6%

73.3%

69.3%

69.2%

60.1%

Frontier over time

Each dot is one model result; the line traces the running best score.

All results

Showing one canonical row per model. Show all configurations

#	Model	Score	Conditions	Eval date	Source	Flags
1	Seed 1.5	88.6%	—	22 Jan 2025	Self-reported	Primary
2	Nemotron 3 Super	84.8%	4-shot	03 Apr 2026	Self-reported	Primary
3	Command A	80.0%	—	07 Apr 2025	Self-reported	Primary
4	Llama 3.3	77.0%	0-shot · CoT	06 Dec 2024	Self-reported	Primary
5	Nova Pro	76.6%	0-shot · CoT	03 Dec 2024	Self-reported	Primary
6	GPT-4o	76.6%	—	16 Apr 2025	Self-reported	Primary
7	Nova Lite	73.3%	0-shot · CoT	03 Dec 2024	Self-reported	Primary
8	Nova Micro	69.3%	0-shot · CoT	03 Dec 2024	Self-reported	Primary
9	Claude Haiku 3.5	69.2%	0-shot · CoT	22 Oct 2024	Self-reported	Primary
10	Claude Opus 3	60.1%	0-shot · CoT	22 Oct 2024	Self-reported	Primary
11	Pixtral 12B	48.1%	Maj@1	10 Oct 2024	Self-reported	Primary
12	Llama 3.2	48.0%	0-shot · CoT	25 Oct 2024	Self-reported	Primary
13	Mixtral 8x22B	28.4%	—	08 Jan 2024	Paper	Primary
14	Mixtral 8x7B	28.4%	—	01 Dec 2023	Self-reported	Primary
15	Gemma 2	15.0%	—	25 Feb 2025	Self-reported	Primary
16	Gemma 2	15.0%	4-shot	25 Feb 2025	Self-reported	Primary
17	Mistral 7B	13.1%	—	01 Sep 2023	Self-reported	Primary

Go to section

Search

MATH

Best results

Frontier over time

All results

Help

People also viewed

Create AI Tools

Mini Tool

Vibe code an AI Tool

Choose listing type: