AIME 2024
30 problems from AIME I and II 2024. Standard high-school competition math eval before AIME 2025 superseded it as primary signal.
Best results
Frontier over time
All results
| # | Model | Score | Conditions | Eval date | Source | Flags |
|---|---|---|---|---|---|---|
| 1 | o4 mini | 93.4% | — | Apr 16, 2025 | self reported | primary |
| 2 | o3 | 91.6% | — | Apr 16, 2025 | self reported | primary |
| 3 | Qwen3 235B A22B | 85.7% | — | Apr 28, 2025 | self reported | primary |
| 4 | Phi 4 reasoning plus | 81.3% | CoT | Jul 8, 2025 | self reported | primary |
| 5 | Qwen3-30B-A3B | 80.4% | — | Apr 28, 2025 | paper | primary |
| 6 | Qwen3 30B A3B | 80.4% | — | Apr 28, 2025 | self reported | primary |
| 7 | DeepSeek-R1 | 79.8% | CoT | Jan 21, 2025 | paper | primary |
| 8 | o1 | 74.3% | — | Apr 16, 2025 | self reported | primary |
| 9 | Magistral Medium | 73.6% | CoT | Jun 10, 2025 | self reported | primary |
| 10 | Kimi K2 | 69.6% | — | Jul 11, 2025 | self reported | primary verified |
| 11 | Claude Sonnet 3.7 (Thinking) | 61.3% | — | Feb 24, 2025 | self reported | primary |
| 12 | Nemotron 3 Super | 53.3% | pass@32 | Apr 3, 2026 | self reported | primary |
| 13 | Grok 3 | 52.2% | — | Feb 19, 2025 | self reported | primary |
| 14 | Grok 3 | 52.2% | — | Feb 19, 2025 | self reported | primary |
| 15 | GPT 4.1 | 48.1% | — | Apr 14, 2025 | self reported | primary |
| 16 | Grok 3 mini | 39.7% | — | Feb 19, 2025 | self reported | primary |
| 17 | DeepSeek V3 | 39.2% | — | Dec 26, 2024 | paper | primary |
| 18 | Claude Sonnet 3.7 | 23.3% | — | Feb 24, 2025 | self reported | primary |
| 19 | Claude Haiku 3.5 | 5.30% | 0-shot · CoT | Oct 22, 2024 | self reported | primary |
