TAAFT
Free mode
100% free
Freemium
Free Trial
Deals

AI benchmarks

Browse benchmarks tracking how AI models perform on reasoning, coding, multimodal and knowledge tasks. Each benchmark has its own leaderboard with the latest results from frontier models.

56
Benchmarks
8
Categories
538
Results recorded
113
Models scored

Cross-benchmark model leaderboard

Compare how every tracked model ranks across the headline benchmarks in one matrix.

Open leaderboard →

AIME 2024

American Invitational Mathematics Examination 2024

30 problems from AIME I and II 2024. Standard high-school competition math eval before AIME 2025 superseded it as primary signal.

Math Text 19 results
Top results
1
o4 mini
93.4%
2
o3
91.6%
3
Qwen3 235B A22B
85.7%
Last eval Apr 3, 2026 View leaderboard →

AIME 2025

American Invitational Mathematics Examination 2025

30 problems from the 2025 AIME I and II contests. High-school competition math with integer answers 0-999; valuable post-cutoff signal for 2024-trained models.

Math Text 37 results
Top results
1
Grok 4 Heavy
100.0%
2
GPT 5.2 Thinking
100.0%
3
DeepSeek 3.2 Speciale
96.0%
Last eval May 5, 2026 View leaderboard →

GSM8K Saturated

Grade School Math 8K

8.5k grade-school math word problems requiring 2-8 step arithmetic reasoning. Saturated by all frontier models; mostly useful as a smoke test today.

Math Text 12 results
Top results
1
Claude Opus 3
95.0%
2
Nova Pro
94.8%
3
Nova Lite
94.5%
Last eval Apr 3, 2026 View leaderboard →

MATH Saturated

MATH (Hendrycks)

12.5k competition mathematics problems (AMC, AIME, USAMO style). Reported as overall % or split by Level 1-5 difficulty. The "easy" levels are now saturated; Level 5 still discri…

Math Text 17 results
Top results
1
Seed 1.5
88.6%
2
Nemotron 3 Super
84.8%
3
Command A
80.0%
Last eval Apr 3, 2026 View leaderboard →

MATH-500

MATH-500 (OpenAI subset)

500-question subset of MATH popularised by OpenAI's o-series releases. Reported widely as the standard 'MATH' number on modern leaderboards.

Math Text 6 results
Top results
1
DeepSeek-R1
97.3%
2
Claude Sonnet 3.7 (Thinking)
96.2%
3
Llama 4 Behemoth
95.0%
Last eval Apr 30, 2025 View leaderboard →

USAMO 2025

USA Mathematical Olympiad 2025

Six proof-based problems from the 2025 USAMO. Graded out of 42 (7 points per problem) by expert judges.

Math Text 2 results
Top results
1
Grok 4 Heavy
61.9points
2
Grok 4
37.5points
Last eval Sep 7, 2025 View leaderboard →
0 AIs selected
Clear selection
#
Name
Task