GPQA Diamond
PhD-level multiple-choice questions in biology, physics, and chemistry, written by domain experts so non-experts cannot answer them even with web search. Diamond is the hardest curated subset.
Best results
Frontier over time
All results
| # | Model | Score | Conditions | Eval date | Source | Flags |
|---|---|---|---|---|---|---|
| 1 | GPT 5.4 Pro | 94.4% | CoT | 05 Mar 2026 | Self-reported | Primary |
| 2 | Gemini 3.1 Pro | 94.3% | CoT | 19 Feb 2026 | Self-reported | Primary |
| 3 | Claude Opus 4.7 | 94.2% | — | 16 Apr 2026 | Self-reported | Primary |
| 4 | Gemini 3 Deep Think | 93.8% | CoT | 12 Feb 2026 | Self-reported | Primary |
| 5 | GPT 5.5 | 93.6% | CoT | 23 Apr 2026 | Self-reported | Primary |
| 6 | GPT 5.2 Pro | 93.2% | CoT | 11 Dec 2025 | Self-reported | Primary |
| 7 | GPT 5.4 | 92.8% | CoT | 05 Mar 2026 | Self-reported | Primary |
| 8 | GPT 5.3 Codex | 92.6% | — | 05 Mar 2026 | Self-reported | Primary |
| 9 | GPT 5.2 Thinking | 92.4% | CoT | 11 Dec 2025 | Self-reported | Primary |
| 10 | Qwen 3.7 Max | 92.4% | 0-shot · CoT · standard | 20 May 2026 | Self-reported | |
| 11 | Gemini 3 Pro | 91.9% | CoT | 18 Nov 2025 | Self-reported | Primary |
| 12 | Claude Opus 4.6 | 91.3% | — | 05 Feb 2026 | Self-reported | Primary |
| 13 | Kimi K2.6 | 90.5% | CoT | 20 Apr 2026 | Self-reported | Primary |
| 14 | Gemini 3 Flash | 90.4% | CoT | 17 Dec 2025 | Self-reported | Primary |
| 15 | Gemini 3 Flash (Thinking) | 90.4% | — | 17 Dec 2025 | Self-reported | Primary |
| 16 | Claude Sonnet 4.6 | 89.9% | — | 17 Feb 2026 | Self-reported | Primary |
| 17 | Muse Spark | 89.5% | — | 08 Apr 2026 | Self-reported | Primary |
| 18 | Grok 4 Heavy | 88.4% | CoT | 09 Jul 2025 | Self-reported | Primary |
| 19 | GPT 5.1 | 88.1% | — | 13 Nov 2025 | Self-reported | Primary |
| 20 | GPT 5.1 Thinking | 88.1% | CoT | 12 Nov 2025 | Self-reported | Primary |
| 21 | GPT 5.4 Mini | 88.0% | CoT | 17 Mar 2026 | Self-reported | Primary |
| 22 | Grok 4 | 87.5% | CoT | 09 Jul 2025 | Self-reported | Primary |
| 23 | Claude Opus 4.5 | 87.0% | — | 24 Nov 2025 | Self-reported | Primary |
| 24 | Qwen 3.5 122B A10B | 86.6% | — | 24 Apr 2026 | Third-party | Primary Verified |
| 25 | Gemini 2.5 Pro (Thinking) | 86.4% | — | 17 Dec 2025 | Self-reported | Primary |
| 26 | GLM-5.1 | 86.2% | CoT | 08 Apr 2026 | Self-reported | Primary |
| 27 | GLM 5 | 86.0% | CoT | 12 Feb 2026 | Self-reported | Primary |
| 28 | GPT 5 (Thinking) | 85.7% | — | 07 Aug 2025 | Self-reported | Primary |
| 29 | Qwen 3.5 27B | 85.5% | — | 24 Feb 2026 | Third-party | Primary Verified |
| 30 | Grok 3 Think | 84.6% | CoT | 19 Feb 2025 | Self-reported | Primary |
| 31 | Gemma 4 | 84.3% | CoT | 03 Apr 2026 | Self-reported | Primary |
| 32 | Qwen 3.5 35B A3B | 84.2% | — | 15 Feb 2025 | Third-party | Primary Verified |
| 33 | Gemini 2.5 Pro | 84.0% | CoT | 25 Mar 2025 | Self-reported | Primary |
| 34 | Claude Sonnet 4.5 | 83.4% | CoT | 29 Sep 2025 | Self-reported | Primary |
| 35 | o3 | 83.3% | — | 16 Apr 2025 | Self-reported | Primary |
| 36 | GPT 5.4 Nano | 82.8% | CoT | 17 Mar 2026 | Self-reported | Primary |
| 37 | Gemini 2.5 Flash (Thinking) | 82.8% | — | 17 Dec 2025 | Self-reported | Primary |
| 38 | Deepseek 3.2 | 82.4% | — | 01 Dec 2025 | Paper | Primary |
| 39 | GLM 4.6 | 81.0% | CoT | 30 Sep 2025 | Self-reported | Primary |
| 40 | Opus 4.1 Thinking | 80.9% | CoT | 05 Aug 2025 | Self-reported | Primary |
| 41 | DeepSeek V3.1 Terminus | 80.7% | — | 22 Sep 2025 | Self-reported | Primary |
| 42 | GPT OSS 120B | 80.1% | CoT | 05 Aug 2025 | Self-reported | Primary |
| 43 | DeepSeek V3.2 Exp | 79.9% | CoT | 29 Sep 2025 | Self-reported | Primary |
| 44 | Claude Sonnet 3.7 (Thinking) | 78.2% | — | 24 Feb 2025 | Self-reported | Primary |
| 45 | o1 | 78.0% | — | 16 Apr 2025 | Self-reported | Primary |
| 46 | GPT 5 | 77.8% | — | 07 Aug 2025 | Self-reported | Primary |
| 47 | Llama 3.1 Nemotron Ultra | 76.0% | — | 08 Apr 2025 | Self-reported | Primary |
| 48 | Claude Sonnet 4 | 75.4% | — | 22 May 2025 | Self-reported | Primary |
| 49 | Grok 3 | 75.4% | — | 19 Feb 2025 | Self-reported | Primary |
| 50 | Grok 3 | 75.4% | — | 19 Feb 2025 | Self-reported | Primary |
| 51 | Kimi K2 Instruct | 75.1% | — | 02 Jul 2025 | Paper | Primary |
| 52 | Nemotron 3 Nano | 75.0% | — | 15 Dec 2025 | Self-reported | Primary |
| 53 | Llama 4 Behemoth | 73.7% | — | 05 Apr 2025 | Self-reported | Primary |
| 54 | Claude Haiku 4.5 | 73.0% | — | 15 Oct 2025 | Self-reported | Primary |
| 55 | Claude Haiku 4.5 | 73.0% | — | 15 Oct 2025 | Self-reported | Primary |
| 56 | Gemma 3 | 72.6% | — | 20 May 2025 | Self-reported | Primary |
| 57 | DeepSeek-R1 | 71.5% | CoT | 21 Jan 2025 | Paper | Primary |
| 58 | R1 1776 | 71.5% | — | 18 Feb 2025 | Self-reported | Primary |
| 59 | Magistral Medium | 70.8% | CoT | 10 Jun 2025 | Self-reported | Primary |
| 60 | Llama 4 Maverick | 69.8% | — | 05 Apr 2025 | Self-reported | Primary |
| 61 | Phi 4 reasoning plus | 69.3% | — | 08 Jul 2026 | Self-reported | Primary |
| 62 | GPT 4.1 | 66.3% | — | 14 Apr 2025 | Self-reported | Primary |
| 63 | Grok 3 mini | 66.2% | — | 19 Feb 2025 | Self-reported | Primary |
| 64 | Qwen3-30B-A3B | 65.8% | CoT | 28 Apr 2025 | Self-reported | Primary |
| 65 | Qwen3 30B A3B | 65.8% | — | 28 Apr 2025 | Self-reported | Primary |
| 66 | Claude Haiku 3.5 | 65.0% | 0-shot · CoT | 22 Oct 2024 | Self-reported | Primary |
| 67 | Seed 1.5 | 65.0% | 0-shot · CoT | 22 Jan 2025 | Self-reported | Primary |
| 68 | Gemini 2.5 Flash-Lite | 64.6% | — | 26 Sep 2025 | Self-reported | Primary |
| 69 | Claude Sonnet 3.7 | 62.3% | — | 24 Feb 2025 | Self-reported | Primary |
| 70 | Gemini 2.0 Flash | 60.1% | 0-shot · CoT · standard | 05 Feb 2025 | Self-reported | |
| 71 | Nemotron 3 Super | 60.0% | 5-shot · CoT | 03 Apr 2026 | Self-reported | Primary |
| 72 | Claude Sonnet 3.5 | 59.4% | 0-shot · CoT · standard | 20 Jun 2024 | Self-reported | |
| 73 | DeepSeek V3 | 59.1% | — | 26 Dec 2024 | Paper | Primary |
| 74 | Llama 4 Scout | 57.2% | — | 05 Apr 2025 | Self-reported | Primary |
| 75 | GPT-4o | 53.6% | — | 16 Apr 2025 | Self-reported | Primary |
| 76 | Command A | 50.8% | — | 07 Apr 2025 | Paper | Primary |
| 77 | Command A | 50.8% | — | 07 Apr 2025 | Self-reported | Primary |
| 78 | Llama 3.3 | 50.5% | 0-shot · CoT | 06 Dec 2025 | Self-reported | Primary |
| 79 | GPT-4 Turbo | 50.4% | — | 01 Jan 2024 | Paper | Primary |
| 80 | Claude Opus 3 | 50.4% | — | 04 Mar 2024 | Self-reported | Primary |
| 81 | Nova Pro | 46.9% | 0-shot · CoT | 03 Dec 2024 | Self-reported | Primary |
| 82 | Mistral Large 3 | 43.9% | 5-shot | 02 Dec 2025 | Self-reported | Primary |
| 83 | Nova Lite | 42.0% | 0-shot · CoT | 03 Dec 2024 | Self-reported | Primary |
| 84 | Nova Micro | 40.0% | 0-shot · CoT | 03 Dec 2024 | Self-reported | Primary |
| 85 | Claude Haiku 3 | 33.3% | 0-shot · CoT · standard | 04 Mar 2024 | Self-reported | |
| 86 | Llama 3.2 | 32.8% | 0-shot | 25 Oct 2024 | Self-reported | Primary |
MongoDB - Build AI That Scales
