AI model leaderboard
Every tracked model ranked across the headline benchmarks. The Intelligence Index averages each model's normalized scores; click any benchmark column header to sort by it.
Models × benchmarks
| # | Model | MMLU-Pro | GPQA Diamond | Humanity's Last Exam | AIME 2025 | SWE-bench Verified | LiveCodeBench | MMMU | AA-LCR | Intelligence Index |
|---|---|---|---|---|---|---|---|---|---|---|
| 1 | Deepseek V4 Pro | — | — | — | — | — | 93.5% | — | — | — |
| 2 | Kimi K2.6 | — | 90.5% | 54.0% | — | 80.2% | 89.6% | — | — | 78.6 4/8 |
| 3 | Kimi K2.5 | — | — | — | — | 76.8% | 85.0% | — | — | — |
| 4 | Deepseek 3.2 | 85.0% | 82.4% | 40.8% | 93.1% | 73.1% | 83.3% | — | — | 76.3 6/8 |
| 5 | GLM 4.6 | — | 81.0% | 17.2% | 93.9% | 68.0% | 82.8% | — | — | 68.6 5/8 |
| 6 | Gemma 4 | 85.2% | 84.3% | — | — | — | 80.0% | — | — | 83.2 3/8 |
| 7 | Grok 4 Heavy | — | 88.4% | 44.4% | 100.0% | — | 79.4% | — | — | 78.1 4/8 |
| 8 | Grok 3 Think | — | 84.6% | — | 93.3% | — | 79.4% | — | — | 85.8 3/8 |
| 9 | Grok 4 | — | 87.5% | 25.4% | 91.7% | — | 79.0% | — | — | 70.9 4/8 |
| 10 | DeepSeek V3.1 Terminus | 85.0% | 80.7% | 21.7% | 88.4% | — | 74.9% | — | — | 70.1 5/8 |
| 11 | DeepSeek V3.2 Exp | 85.0% | 79.9% | — | 89.3% | 67.8% | 74.1% | — | — | 79.2 5/8 |
| 12 | Qwen3-235B-A22B | — | — | — | 81.5% | — | 70.7% | — | — | — |
| 13 | Qwen3 235B A22B | — | — | — | 81.5% | — | 70.7% | — | — | — |
| 14 | Gemini 2.5 Pro | — | 84.0% | 18.8% | 86.7% | 63.8% | 70.4% | — | — | 64.7 5/8 |
| 15 | Nemotron 3 Nano | 78.3% | 75.0% | — | 89.1% | — | 68.3% | — | — | 77.7 4/8 |
| 16 | Qwen3 30B A3B | — | 65.8% | — | 70.9% | — | 62.6% | — | — | 66.4 3/8 |
| 17 | Grok 3 | 79.9% | 75.4% | — | — | — | 57.0% | 73.2% | — | 71.4 4/8 |
| 18 | Kimi K2 Instruct | — | 75.1% | — | 49.5% | 65.8% | 53.7% | — | — | 61.0 4/8 |
| 19 | Magistral Medium | — | 70.8% | — | 64.9% | — | 50.3% | — | — | 62.0 3/8 |
| 20 | Llama 4 Behemoth | 82.2% | 73.7% | — | — | — | 49.4% | 76.1% | — | 70.4 4/8 |
| 21 | Llama 4 Maverick | 80.5% | 69.8% | — | — | — | 43.4% | 73.4% | — | 66.8 4/8 |
| 22 | Grok 3 mini | 78.9% | 66.2% | — | — | — | 41.5% | 69.4% | — | 64.0 4/8 |
| 23 | Mistral Large 3 | — | 43.9% | — | — | — | 34.4% | — | — | — |
| 24 | Gemini 2.5 Flash-Lite | — | 64.6% | 5.10% | 49.8% | 31.6% | 33.7% | 72.9% | — | 43.0 6/8 |
| 25 | Llama 4 Scout | 74.3% | 57.2% | — | — | — | 32.8% | 69.4% | — | 58.4 4/8 |
| 26 | Claude Haiku 3.5 | 41.6% | 65.0% | — | — | 40.6% | — | — | — | 49.1 3/8 |
| 27 | Claude Haiku 4.5 | — | 73.0% | — | 80.7% | 73.3% | — | 73.2% | — | 75.1 4/8 |
| 28 | Claude Opus 3 | — | 50.4% | — | — | — | — | — | — | — |
| 29 | Claude Opus 4.5 | — | 87.0% | — | — | 80.9% | — | 80.7% | — | 82.9 3/8 |
| 30 | Claude Opus 4.6 | — | 91.3% | — | — | 80.8% | — | — | — | — |
| 31 | Claude Opus 4.7 | — | 94.2% | 46.9% | — | 87.6% | — | — | — | 76.2 3/8 |
| 32 | Claude Sonnet 3.7 | — | 62.3% | — | — | 62.3% | — | 71.8% | — | 65.5 3/8 |
| 33 | Claude Sonnet 3.7 (Thinking) | — | 78.2% | — | — | 62.3% | — | 75.0% | — | 71.8 3/8 |
| 34 | Claude Sonnet 4 | — | 75.4% | — | 70.5% | 72.7% | — | 74.4% | — | 73.3 4/8 |
| 35 | Claude Sonnet 4.5 | — | 83.4% | — | 87.0% | 77.2% | — | 77.8% | — | 81.4 4/8 |
| 36 | Claude Sonnet 4.6 | — | 89.9% | 33.2% | — | 79.6% | — | — | — | 67.6 3/8 |
| 37 | Command A | 69.6% | 50.8% | — | — | — | — | — | — | — |
| 38 | DeepSeek 3.2 Speciale | — | — | 30.6% | 96.0% | — | — | — | — | — |
| 39 | DeepSeek V3 | 75.9% | 59.1% | — | — | 42.0% | — | — | — | 59.0 3/8 |
| 40 | DeepSeek-R1 | 84.0% | 71.5% | — | 70.0% | 49.2% | — | — | — | 68.7 4/8 |
| 41 | Devstral 2 | — | — | — | — | 72.2% | — | — | — | — |
| 42 | Gemini 2.5 Flash (Thinking) | — | 82.8% | 11.0% | 72.0% | 60.4% | — | — | — | 56.6 4/8 |
| 43 | Gemini 2.5 Pro (Thinking) | — | 86.4% | 21.6% | 88.0% | 59.6% | — | — | — | 63.9 4/8 |
| 44 | Gemini 3 Deep Think | — | 93.8% | 41.0% | — | — | — | — | — | — |
| 45 | Gemini 3 Flash | — | 90.4% | — | — | 78.0% | — | — | — | — |
| 46 | Gemini 3 Flash (Thinking) | — | 90.4% | 33.7% | 95.2% | 78.0% | — | — | — | 74.3 4/8 |
| 47 | Gemini 3 Pro | — | 91.9% | 37.5% | 95.0% | 76.2% | — | — | — | 75.2 4/8 |
| 48 | Gemini 3.1 Pro | — | 94.3% | 44.4% | — | 80.6% | — | — | — | 73.1 3/8 |
| 49 | Gemma 3 | 78.0% | 72.6% | — | — | — | — | — | — | — |
| 50 | GLM 5 | — | 86.0% | — | — | 77.8% | — | — | — | — |
| 51 | GLM-5.1 | — | 86.2% | 31.0% | — | — | — | — | — | — |
| 52 | GPT 4.1 | — | 66.3% | — | — | 55.0% | — | 75.0% | — | 65.4 3/8 |
| 53 | GPT 5 | — | 77.8% | 6.30% | 61.9% | 52.8% | — | 74.4% | — | 54.6 5/8 |
| 54 | GPT 5 (Thinking) | — | 85.7% | 24.8% | 94.6% | 74.9% | — | 84.2% | — | 72.8 5/8 |
| 55 | GPT 5.1 | — | 88.1% | — | 94.6% | 74.9% | — | 84.2% | — | 85.5 4/8 |
| 56 | GPT 5.1 Thinking | — | 88.1% | — | 94.6% | — | — | — | — | — |
| 57 | GPT 5.2 Pro | — | 93.2% | — | — | — | — | — | — | — |
| 58 | GPT 5.2 Thinking | — | 92.4% | — | 100.0% | 80.0% | — | — | — | 90.8 3/8 |
| 59 | GPT 5.3 Codex | — | 92.6% | — | — | 56.8% | — | — | — | — |
| 60 | GPT 5.4 | — | 92.8% | — | — | 57.7% | — | — | — | — |
| 61 | GPT 5.4 Mini | — | 88.0% | — | — | — | — | — | — | — |
| 62 | GPT 5.4 Nano | — | 82.8% | — | — | — | — | — | — | — |
| 63 | GPT 5.4 Pro | — | 94.4% | — | — | — | — | — | — | — |
| 64 | GPT 5.5 | — | 93.6% | 41.4% | — | — | — | — | — | — |
| 65 | GPT 5.5 Instant | — | — | — | 81.2% | — | — | — | — | — |
| 66 | GPT OSS 120B | 90.0% | 80.1% | — | — | — | — | — | — | — |
| 67 | GPT-4 Turbo | — | 50.4% | — | — | — | — | — | — | — |
| 68 | GPT-4o | — | 53.6% | — | — | — | — | 69.1% | — | — |
| 69 | Grok Code Fast 1 | — | — | — | — | 70.8% | — | — | — | — |
| 70 | Llama 3.1 Nemotron Ultra | — | 76.0% | — | — | — | — | — | — | — |
| 71 | Llama 3.2 | — | 32.8% | — | — | — | — | — | — | — |
| 72 | Llama 3.3 | 68.9% | 50.5% | — | — | — | — | — | — | — |
| 73 | MiniMax M2.5 | — | — | — | — | 80.2% | — | — | — | — |
| 74 | Mistral Medium 3.5 | — | — | — | — | 77.6% | — | — | — | — |
| 75 | Mistral Small 3 | 66.3% | — | — | — | — | — | — | — | — |
| 76 | Muse Spark | — | 89.5% | 42.8% | — | 77.4% | — | — | — | 69.9 3/8 |
| 77 | Nemotron 3 Super | 75.7% | 60.0% | — | — | — | — | — | — | — |
| 78 | Nova Lite | — | 42.0% | — | — | — | — | — | — | — |
| 79 | Nova Micro | — | 40.0% | — | — | — | — | — | — | — |
| 80 | Nova Premier | — | — | — | — | 42.4% | — | — | — | — |
| 81 | Nova Pro | — | 46.9% | — | — | — | — | — | — | — |
| 82 | o1 | — | 78.0% | 8.12% | 79.2% | 48.9% | — | 77.6% | — | 58.4 5/8 |
| 83 | o3 | — | 83.3% | 20.3% | 88.9% | 69.1% | — | 82.9% | — | 68.9 5/8 |
| 84 | o4 mini | — | — | — | 92.7% | 68.1% | — | 81.6% | — | 80.8 3/8 |
| 85 | Opus 4.1 Thinking | — | 80.9% | — | — | 74.5% | — | — | — | — |
| 86 | Phi 4 reasoning plus | 76.0% | 69.3% | — | 78.0% | — | — | — | — | 74.4 3/8 |
| 87 | Pixtral 12B | — | — | — | — | — | — | 52.0% | — | — |
| 88 | Pixtral Large | — | — | — | — | — | — | 64.0% | — | — |
| 89 | Qwen 3.5 122B A10B | — | 86.6% | — | — | 72.0% | — | 76.9% | — | 78.5 3/8 |
| 90 | Qwen 3.5 27B | 86.1% | 85.5% | — | — | 72.4% | — | — | — | 81.3 3/8 |
| 91 | Qwen 3.5 35B A3B | — | 84.2% | — | — | 69.2% | — | — | — | — |
| 92 | Qwen3 Coder | — | — | — | — | 67.0% | — | — | — | — |
| 93 | Qwen3-30B-A3B | — | 65.8% | — | — | — | — | — | — | — |
| 94 | R1 1776 | — | 71.5% | — | 70.0% | — | — | — | — | — |
| 95 | Seed 1.5 | 80.1% | 65.0% | — | — | — | — | 73.9% | — | 73.0 3/8 |
