AI model leaderboard

Every tracked model ranked across the headline benchmarks. The Intelligence Index averages each model's normalized scores; click any benchmark column header to sort by it.

Best overall

GPT 5.2 Thinking

90.8 Intelligence Index

Best at knowledge

GPT 5.6 Sol

94.6 GPQA Diamond

Best at math

Grok 4 Heavy

100.0 AIME 2025

Best at coding

Claude Opus 4.7

87.6 SWE-bench Verified

Best at multimodal

GPT 5

84.2 MMMU

Models × benchmarks

Cells are best primary scores. Color intensity reflects normalized score. Click a column header to sort.

Showing 100 of 100 models

#	Model	MMLU-Pro	GPQA Diamond	Humanity's Last Exam	AIME 2025	SWE-bench Verified	LiveCodeBench	MMMU	AA-LCR		Intelligence Index
1	Deepseek V4 Pro	—	—	—	—	—	93.5%	—	—	—	—
2	Kimi K2.6	—	90.5%	54.0%	—	80.2%	89.6%	—	—	—	78.6 4/8
3	Kimi K2.5	—	—	—	—	76.8%	85.0%	—	—	—	—
4	Qwen 3.6 27B	86.2%	87.8%	24.0%	—	77.2%	83.9%	82.9%	—	—	73.7 6/8
5	Deepseek 3.2	85.0%	82.4%	40.8%	93.1%	73.1%	83.3%	—	—	—	76.3 6/8
6	GLM 4.6	—	81.0%	17.2%	93.9%	68.0%	82.8%	—	—	—	68.6 5/8
7	Gemma 4	85.2%	84.3%	—	—	—	80.0%	—	—	—	83.2 3/8
8	Grok 4 Heavy	—	88.4%	44.4%	100.0%	—	79.4%	—	—	—	78.1 4/8
9	Grok 3 Think	—	84.6%	—	93.3%	—	79.4%	—	—	—	85.8 3/8
10	Grok 4	—	87.5%	25.4%	91.7%	—	79.0%	—	—	—	70.9 4/8
11	DeepSeek V3.1 Terminus	85.0%	80.7%	21.7%	88.4%	—	74.9%	—	—	—	70.1 5/8
12	DeepSeek V3.2 Exp	85.0%	79.9%	—	89.3%	67.8%	74.1%	—	—	—	79.2 5/8
13	Qwen3 235B A22B	—	—	—	81.5%	—	70.7%	—	—	—	—
14	Gemini 2.5 Pro	—	86.4%	21.6%	88.0%	63.8%	70.4%	—	—	—	66.0 5/8
15	Nemotron 3 Nano	78.3%	75.0%	—	89.1%	—	68.3%	—	—	—	77.7 4/8
16	Nemotron 3	78.3%	75.0%	—	89.1%	38.8%	68.3%	—	—	—	69.9 5/8
17	Qwen3 30B A3B	—	65.8%	—	70.9%	—	62.6%	—	—	—	66.4 3/8
18	Grok 3	79.9%	75.4%	—	—	—	57.0%	73.2%	—	—	71.4 4/8
19	Kimi K2 Instruct	—	75.1%	—	49.5%	65.8%	53.7%	—	—	—	61.0 4/8
20	Magistral Medium	—	70.8%	—	64.9%	—	50.3%	—	—	—	62.0 3/8
21	Llama 4 Behemoth	82.2%	73.7%	—	—	—	49.4%	76.1%	—	—	70.4 4/8
22	Llama 4 Maverick	80.5%	69.8%	—	—	—	43.4%	73.4%	—	—	66.8 4/8
23	Grok 3 mini	78.9%	66.2%	—	—	—	41.5%	69.4%	—	—	64.0 4/8
24	Mistral Large 3	—	43.9%	—	—	—	34.4%	—	—	—	—
25	Gemini 2.5 Flash-Lite	—	64.6%	5.10%	49.8%	31.6%	33.7%	72.9%	—	—	43.0 6/8
26	Llama 4 Scout	74.3%	57.2%	—	—	—	32.8%	69.4%	—	—	58.4 4/8
27	Claude Fable 5	—	—	59.0%	—	—	—	—	—	—	—
28	Claude Haiku 3.5	41.6%	65.0%	—	—	40.6%	—	—	—	—	49.1 3/8
29	Claude Haiku 4.5	—	73.0%	—	80.7%	73.3%	—	73.2%	—	—	75.1 4/8
30	Claude Opus 3	—	50.4%	—	—	—	—	—	—	—	—
31	Claude Opus 4.5	—	87.0%	—	—	80.9%	—	80.7%	—	—	82.9 3/8
32	Claude Opus 4.6	—	91.3%	—	—	80.8%	—	—	—	—	—
33	Claude Opus 4.7	—	94.2%	46.9%	—	87.6%	—	—	—	—	76.2 3/8
34	Claude Opus 4.8	—	—	—	—	—	—	—	—	—	—
35	Claude Sonnet 3.7	—	78.2%	—	—	62.3%	—	75.0%	—	—	71.8 3/8
36	Claude Sonnet 4	—	75.4%	—	70.5%	72.7%	—	74.4%	—	—	73.3 4/8
37	Claude Sonnet 4.5	—	83.4%	—	87.0%	77.2%	—	77.8%	—	—	81.4 4/8
38	Claude Sonnet 4.6	—	89.9%	33.2%	—	79.6%	—	—	—	—	67.6 3/8
39	Claude Sonnet 5	—	—	43.2%	—	—	—	—	—	—	—
40	Code Llama	—	—	—	—	—	—	—	—	—	—
41	Command A	69.6%	50.8%	—	—	—	—	—	—	—	—
42	Command R Plus	—	—	—	—	—	—	—	—	—	—
43	DBRX Instruct	—	—	—	—	—	—	—	—	—	—
44	DeepSeek 3.2 Speciale	—	—	30.6%	96.0%	—	—	—	—	—	—
45	DeepSeek V3	75.9%	59.1%	—	—	42.0%	—	—	—	—	59.0 3/8
46	DeepSeek-R1	84.0%	71.5%	—	70.0%	49.2%	—	—	—	—	68.7 4/8
47	Devstral 2	—	—	—	—	72.2%	—	—	—	—	—
48	Gemini 2.5 Flash	—	82.8%	11.0%	72.0%	60.4%	—	—	—	—	56.6 4/8
49	Gemini 3 Deep Think	—	93.8%	41.0%	—	—	—	—	—	—	—
50	Gemini 3 Flash	—	90.4%	33.7%	95.2%	78.0%	—	—	—	—	74.3 4/8
51	Gemini 3 Pro	—	91.9%	37.5%	95.0%	76.2%	—	—	—	—	75.2 4/8
52	Gemini 3.1 Pro	—	94.3%	44.4%	—	80.6%	—	—	—	—	73.1 3/8
53	Gemini 3.5 Flash	—	—	—	—	—	—	—	—	—	—
54	Gemma 2	—	—	—	—	—	—	—	—	—	—
55	Gemma 3	78.0%	72.6%	—	—	—	—	—	—	—	—
56	GLM 5	—	86.0%	—	—	77.8%	—	—	—	—	—
57	GLM 5.1	—	86.2%	31.0%	—	—	—	—	—	—	—
58	GLM 5.2	—	91.2%	40.5%	—	—	—	—	—	—	—
59	GPT 4.1	—	66.3%	—	—	55.0%	—	75.0%	—	—	65.4 3/8
60	GPT 5	—	85.7%	24.8%	94.6%	74.9%	—	84.2%	—	—	72.8 5/8
61	GPT 5.1	—	88.1%	—	94.6%	74.9%	—	84.2%	—	—	85.5 4/8
62	GPT 5.1 Thinking	—	88.1%	—	94.6%	—	—	—	—	—	—
63	GPT 5.2	—	—	—	—	—	—	—	—	—	—
64	GPT 5.2 Codex	—	—	—	—	—	—	—	—	—	—
65	GPT 5.2 Pro	—	93.2%	—	—	—	—	—	—	—	—
66	GPT 5.2 Thinking	—	92.4%	—	100.0%	80.0%	—	—	—	—	90.8 3/8
67	GPT 5.3 Codex	—	92.6%	—	—	56.8%	—	—	—	—	—
68	GPT 5.4	—	92.8%	—	—	57.7%	—	—	—	—	—
69	GPT 5.4 Mini	—	88.0%	—	—	—	—	—	—	—	—
70	GPT 5.4 Nano	—	82.8%	—	—	—	—	—	—	—	—
71	GPT 5.4 Pro	—	94.4%	—	—	—	—	—	—	—	—
72	GPT 5.5	—	93.6%	41.4%	—	—	—	—	—	—	—
73	GPT 5.5 Instant	—	—	—	81.2%	—	—	—	—	—	—
74	GPT 5.6 Luna	—	92.3%	—	—	—	—	—	—	—	—
75	GPT 5.6 Sol	—	94.6%	—	—	—	—	—	—	—	—
76	GPT 5.6 Terra	—	92.9%	—	—	—	—	—	—	—	—
77	GPT OSS 120B	90.0%	80.1%	—	—	—	—	—	—	—	—
78	GPT-4	—	—	—	—	—	—	—	—	—	—
79	GPT-4 Turbo	—	50.4%	—	—	—	—	—	—	—	—
80	GPT-4o	—	53.6%	—	—	—	—	69.1%	—	—	—
81	Grok Code Fast 1	—	—	—	—	70.8%	—	—	—	—	—
82	Kimi K2	—	—	—	—	—	—	—	—	—	—
83	Kimi K2.7 Code	—	89.6%	32.8%	—	—	—	—	66.3%	—	62.9 3/8
84	LLaMA 2	—	—	—	—	—	—	—	—	—	—
85	Llama 3.1 Nemotron Ultra	—	76.0%	—	—	—	—	—	—	—	—
86	Llama 3.2	—	32.8%	—	—	—	—	—	—	—	—
87	Llama 3.3	68.9%	50.5%	—	—	—	—	—	—	—	—
88	MiMo V2.5 Pro	—	—	48.0%	—	78.9%	—	—	—	—	—
89	MiniMax M2.5	—	—	—	—	80.2%	—	—	—	—	—
90	MiniMax M2.7	—	—	—	—	—	—	—	—	—	—
91	Mistral 7B	—	—	—	—	—	—	—	—	—	—
92	Mistral Medium 3.5	—	—	—	—	77.6%	—	—	—	—	—
93	Mistral NeMo	—	—	—	—	—	—	—	—	—	—
94	Mistral Small 3	66.3%	—	—	—	—	—	—	—	—	—
95	Mixtral 8x22B	—	—	—	—	—	—	—	—	—	—
96	Mixtral 8x7B	—	—	—	—	—	—	—	—	—	—
97	Muse Spark	—	89.5%	42.8%	—	77.4%	—	—	—	—	69.9 3/8
98	Nemotron 3 Super	75.7%	60.0%	—	—	—	—	—	—	—	—
99	Nova Lite	—	42.0%	—	—	—	—	—	—	—	—
100	Nova Micro	—	40.0%	—	—	—	—	—	—	—	—

Capability scatter

Each dot is a model. Position shows two-axis capability; size reflects how many headline benchmarks the model has been scored on.

X axis Y axis

Go to section

Search

AI model leaderboard

Models × benchmarks

Capability scatter

Help

People also viewed

Create AI Tools

Mini Tool

Vibe code an AI Tool

Choose listing type: