Free mode

About Free mode

100% free

Freemium

Free Trial

Work

Deals

ARC Challenge

AI2 Reasoning Challenge (Challenge set)

Grade-school science multiple-choice, hard subset. Saturated by frontier models but still in many evaluation harnesses.

Knowledge Text Accuracy Max 100.0% Released Mar 2018 Saturated Possibly contaminated

10

Results

9

Models scored

554.0%

Top: Gemma 2

91.3%

Median

Best results

Top primary scores; one row per model.

1

554.0%

2

96.4%

3

Nemotron 3 Super

96.1%

4

94.8%

5

92.4%

6

90.2%

7

78.6%

8

59.7%

9

59.7%

10

55.6%

Frontier over time

Each dot is one model result; the line traces the running best score.

All results

Showing one canonical row per model. Show all configurations

#	Model	Score	Conditions	Eval date	Source	Flags
1	Gemma 2	554.0%	—	25 Feb 2025	Self-reported	Primary
2	Claude Opus 3	96.4%	25-shot	22 Oct 2024	Self-reported	Primary
3	Nemotron 3 Super	96.1%	25-shot	03 Apr 2026	Self-reported	Primary
4	Nova Pro	94.8%	0-shot	03 Dec 2024	Self-reported	Primary
5	Nova Lite	92.4%	0-shot	03 Dec 2024	Self-reported	Primary
6	Nova Micro	90.2%	0-shot	03 Dec 2024	Self-reported	Primary
7	Llama 3.2	78.6%	0-shot	22 Oct 2024	Self-reported	Primary
8	Mixtral 8x7B	59.7%	—	01 Dec 2023	Self-reported	Primary
9	Mixtral 8x7B	59.7%	—	08 Jan 2024	Self-reported	Primary
10	Mistral 7B	55.6%	—	01 Sep 2023	Self-reported	Primary

✕

0 AIs selected

Clear selection

#

Name

Task