TAAFT
Free mode
100% free
Freemium
Free Trial
Deals

ARC-AGI-2

Abstraction and Reasoning Corpus — AGI v2

Abstract visual-grid puzzles designed to resist memorisation. Each task can be solved by humans from a few examples; LLMs typically struggle without test-time adaptation.

Reasoning Text Accuracy Max 100.0% Released Mar 2025
17
Results
17
Models scored
85.0%
Top: GPT 5.5
45.1%
Median

Best results

Top primary scores; one row per model.

Frontier over time

Each dot is one model result; the line traces the running best score.
Best score over time0.0025.050.075.0100.0Jul 2025Nov 2025Apr 2026

All results

Showing one canonical row per model. Show all configurations
# Model Score Conditions Eval date Source Flags
1 GPT 5.5 85.0% CoT 23 Apr 2026 Self-reported Primary
2 GPT 5.4 Pro 83.3% CoT 05 Mar 2026 Self-reported Primary
3 Gemini 3.1 Pro 77.1% CoT 19 Feb 2026 Self-reported Primary
4 GPT 5.4 73.3% CoT 05 Mar 2026 Self-reported Primary
5 Claude Opus 4.6 69.2% 05 Feb 2026 Self-reported Primary
6 Claude Sonnet 4.6 58.3% CoT 17 Feb 2026 Self-reported Primary
7 GPT 5.2 Pro 54.2% 11 Dec 2025 Self-reported Primary
8 GPT 5.2 Thinking 52.9% CoT 11 Dec 2025 Self-reported Primary
9 Gemini 3 Deep Think 45.1% CoT 12 Feb 2026 Self-reported Primary
10 Muse Spark 42.5% CoT 08 Apr 2026 Self-reported Primary
11 Claude Opus 4.5 37.6% 24 Nov 2025 Self-reported Primary
12 Gemini 3 Flash (Thinking) 33.6% 17 Dec 2025 Self-reported Primary
13 Gemini 3 Pro 31.1% CoT 18 Nov 2025 Self-reported Primary
14 GPT 5.1 17.6% 13 Nov 2025 Self-reported Primary
15 Grok 4 15.9% CoT 09 Jul 2025 Self-reported Primary
16 Gemini 2.5 Pro (Thinking) 4.90% 17 Dec 2025 Self-reported Primary
17 Gemini 2.5 Flash (Thinking) 2.50% 17 Dec 2025 Self-reported Primary
0 AIs selected
Clear selection
#
Name
Task