ARC-AGI-2
Abstract visual-grid puzzles designed to resist memorisation. Each task can be solved by humans from a few examples; LLMs typically struggle without test-time adaptation.
Best results
Frontier over time
All results
| # | Model | Score | Conditions | Eval date | Source | Flags |
|---|---|---|---|---|---|---|
| 1 | GPT 5.5 | 85.0% | CoT | 23 Apr 2026 | Self-reported | Primary |
| 2 | GPT 5.4 Pro | 83.3% | CoT | 05 Mar 2026 | Self-reported | Primary |
| 3 | Gemini 3.1 Pro | 77.1% | CoT | 19 Feb 2026 | Self-reported | Primary |
| 4 | GPT 5.4 | 73.3% | CoT | 05 Mar 2026 | Self-reported | Primary |
| 5 | Claude Opus 4.6 | 69.2% | — | 05 Feb 2026 | Self-reported | Primary |
| 6 | Claude Sonnet 4.6 | 58.3% | CoT | 17 Feb 2026 | Self-reported | Primary |
| 7 | GPT 5.2 Pro | 54.2% | — | 11 Dec 2025 | Self-reported | Primary |
| 8 | GPT 5.2 Thinking | 52.9% | CoT | 11 Dec 2025 | Self-reported | Primary |
| 9 | Gemini 3 Deep Think | 45.1% | CoT | 12 Feb 2026 | Self-reported | Primary |
| 10 | Muse Spark | 42.5% | CoT | 08 Apr 2026 | Self-reported | Primary |
| 11 | Claude Opus 4.5 | 37.6% | — | 24 Nov 2025 | Self-reported | Primary |
| 12 | Gemini 3 Flash (Thinking) | 33.6% | — | 17 Dec 2025 | Self-reported | Primary |
| 13 | Gemini 3 Pro | 31.1% | CoT | 18 Nov 2025 | Self-reported | Primary |
| 14 | GPT 5.1 | 17.6% | — | 13 Nov 2025 | Self-reported | Primary |
| 15 | Grok 4 | 15.9% | CoT | 09 Jul 2025 | Self-reported | Primary |
| 16 | Gemini 2.5 Pro (Thinking) | 4.90% | — | 17 Dec 2025 | Self-reported | Primary |
| 17 | Gemini 2.5 Flash (Thinking) | 2.50% | — | 17 Dec 2025 | Self-reported | Primary |
