AIME 2025
30 problems from the 2025 AIME I and II contests. High-school competition math with integer answers 0-999; valuable post-cutoff signal for 2024-trained models.
Best results
Frontier over time
All results
| # | Model | Score | Conditions | Eval date | Source | Flags |
|---|---|---|---|---|---|---|
| 1 | Grok 4 Heavy | 100.0% | CoT | Jul 9, 2025 | self reported | primary |
| 2 | GPT 5.2 Thinking | 100.0% | CoT | Dec 11, 2025 | self reported | primary |
| 3 | DeepSeek 3.2 Speciale | 96.0% | — | Dec 1, 2025 | paper | primary |
| 4 | Gemini 3 Flash (Thinking) | 95.2% | — | Dec 17, 2025 | self reported | primary |
| 5 | Gemini 3 Pro | 95.0% | CoT | Nov 18, 2025 | self reported | primary |
| 6 | GPT 5.1 | 94.6% | 0-shot · CoT | Nov 13, 2025 | self reported | primary |
| 7 | GPT 5.1 Thinking | 94.6% | CoT | Nov 12, 2025 | self reported | primary |
| 8 | GPT 5 (Thinking) | 94.6% | — | Aug 7, 2025 | self reported | primary |
| 9 | GLM 4.6 | 93.9% | CoT | Sep 30, 2025 | self reported | primary |
| 10 | Grok 3 Think | 93.3% | CoT · cons@64 | Feb 19, 2025 | self reported | primary |
| 11 | Grok 3 Think | 93.3% | — | Feb 18, 2025 | self reported | primary |
| 12 | Deepseek 3.2 | 93.1% | — | Dec 1, 2025 | paper | primary |
| 13 | o4 mini | 92.7% | — | Apr 16, 2025 | self reported | primary |
| 14 | Grok 4 | 91.7% | CoT | Jul 9, 2025 | self reported | primary |
| 15 | DeepSeek V3.2 Exp | 89.3% | CoT | Sep 29, 2025 | self reported | primary |
| 16 | Nemotron 3 Nano | 89.1% | — | Dec 15, 2025 | self reported | primary |
| 17 | o3 | 88.9% | — | Apr 16, 2025 | self reported | primary |
| 18 | DeepSeek V3.1 Terminus | 88.4% | — | Sep 22, 2025 | self reported | primary |
| 19 | Gemini 2.5 Pro (Thinking) | 88.0% | — | Dec 17, 2025 | self reported | primary |
| 20 | Claude Sonnet 4.5 | 87.0% | — | Sep 29, 2025 | self reported | primary |
| 21 | Gemini 2.5 Pro | 86.7% | CoT | May 17, 2025 | self reported | primary |
| 22 | Qwen3-235B-A22B | 81.5% | CoT | Apr 28, 2025 | self reported | primary |
| 23 | Qwen3 235B A22B | 81.5% | — | Apr 28, 2025 | self reported | primary |
| 24 | GPT 5.5 Instant | 81.2% | 0-shot | May 5, 2026 | self reported | primary |
| 25 | Claude Haiku 4.5 | 80.7% | — | Oct 15, 2025 | self reported | primary |
| 26 | Claude Haiku 4.5 | 80.7% | — | Oct 15, 2025 | self reported | primary |
| 27 | o1 | 79.2% | — | Apr 16, 2025 | self reported | primary |
| 28 | Phi 4 reasoning plus | 78.0% | CoT | Jul 8, 2025 | self reported | primary |
| 29 | Gemini 2.5 Flash (Thinking) | 72.0% | — | Dec 17, 2025 | self reported | primary |
| 30 | Qwen3 30B A3B | 70.9% | — | Apr 28, 2025 | self reported | primary |
| 31 | Claude Sonnet 4 | 70.5% | — | May 22, 2025 | self reported | primary |
| 32 | R1 1776 | 70.0% | — | Feb 18, 2025 | self reported | primary |
| 33 | DeepSeek-R1 | 70.0% | CoT | Jan 21, 2025 | self reported | primary |
| 34 | Magistral Medium | 64.9% | CoT | Jun 10, 2025 | self reported | primary |
| 35 | GPT 5 | 61.9% | — | Aug 7, 2025 | self reported | primary |
| 36 | Gemini 2.5 Flash-Lite | 49.8% | — | Sep 26, 2025 | self reported | primary |
| 37 | Kimi K2 Instruct | 49.5% | — | Jul 2, 2025 | paper | primary |
