SWE-bench Verified

500 manually validated GitHub issues from popular Python repos. Models must produce a patch that passes the hidden test suite. The current standard for "real software engineering" capability.

Coding Text Accuracy Max 100.0% Released Aug 2024

Homepage Paper Code

Results

Models scored

87.6%

Top: Claude Opus 4.7

72.2%

Median

Best results

Top primary scores; one row per model.

87.6%

80.9%

80.8%

80.6%

80.2%

80.2%

80.0%

79.6%

78.9%

78.0%

Frontier over time

Each dot is one model result; the line traces the running best score.

All results

Showing one canonical row per model. Show all configurations

#	Model	Score	Conditions	Eval date	Source	Flags
1	Claude Opus 4.7	87.6%	CoT	16 Apr 2026	Self-reported	Primary
2	Claude Opus 4.5	80.9%	—	24 Nov 2025	Self-reported	Primary
3	Claude Opus 4.6	80.8%	—	05 Feb 2026	Self-reported	Primary
4	Gemini 3.1 Pro	80.6%	CoT	19 Feb 2026	Self-reported	Primary
5	MiniMax M2.5	80.2%	0-shot · CoT · agentic, avg@4	12 Feb 2026	Self-reported	Primary
6	Kimi K2.6	80.2%	CoT	20 Apr 2026	Self-reported	Primary
7	GPT 5.2 Thinking	80.0%	CoT	11 Dec 2025	Self-reported	Primary
8	Claude Sonnet 4.6	79.6%	—	17 Feb 2026	Self-reported	Primary
9	MiMo V2.5 Pro	78.9%	0-shot · agentic	27 Apr 2026	Self-reported	Primary
10	Gemini 3 Flash	78.0%	CoT	17 Dec 2025	Self-reported	Primary
11	Gemini 3 Flash (Thinking)	78.0%	—	17 Dec 2025	Self-reported	Primary
12	GLM 5	77.8%	CoT	12 Feb 2026	Self-reported	Primary
13	Mistral Medium 3.5	77.6%	—	27 Apr 2026	Self-reported	Primary
14	Muse Spark	77.4%	CoT	08 Apr 2026	Self-reported	Primary
15	Claude Sonnet 4.5	77.2%	CoT	29 Sep 2025	Self-reported	Primary
16	Qwen 3.6 27B	77.2%	0-shot · agentic	—	Self-reported	Primary
17	Kimi K2.5	76.8%	CoT	27 Jan 2026	Self-reported	Primary
18	Gemini 3 Pro	76.2%	CoT	18 Nov 2025	Self-reported	Primary
19	GPT 5.1	74.9%	0-shot · CoT	13 Nov 2025	Self-reported	Primary
20	GPT 5 (Thinking)	74.9%	—	07 Aug 2025	Self-reported	Primary
21	Opus 4.1 Thinking	74.5%	CoT	05 Aug 2025	Self-reported	Primary
22	Claude Haiku 4.5	73.3%	0-shot · CoT	15 Oct 2025	Self-reported	Primary
23	Claude Haiku 4.5	73.3%	—	15 Oct 2025	Self-reported	Primary
24	Deepseek 3.2	73.1%	—	01 Dec 2025	Paper	Primary Verified
25	Claude Sonnet 4	72.7%	—	22 May 2025	Self-reported	Primary
26	Qwen 3.5 27B	72.4%	—	24 Feb 2026	Third-party	Primary Verified
27	Devstral 2	72.2%	0-shot	09 Dec 2025	Self-reported	Primary
28	Qwen 3.5 122B A10B	72.0%	—	24 Feb 2026	Third-party	Primary Verified
29	Grok Code Fast 1	70.8%	CoT	09 Jul 2025	Self-reported	Primary
30	Qwen 3.5 35B A3B	69.2%	—	24 Feb 2026	Third-party	Primary Verified
31	o3	69.1%	—	16 Apr 2025	Self-reported	Primary
32	o4 mini	68.1%	—	16 Apr 2025	Self-reported	Primary
33	GLM 4.6	68.0%	CoT	30 Sep 2025	Self-reported	Primary
34	DeepSeek V3.2 Exp	67.8%	CoT	29 Sep 2025	Self-reported	Primary
35	Qwen3 Coder	67.0%	—	22 Jul 2025	Self-reported	Primary
36	Kimi K2 Instruct	65.8%	—	20 Jul 2025	Paper	Primary
37	Gemini 2.5 Pro	63.8%	CoT	25 Mar 2025	Self-reported	Primary
38	Trinity Large Thinking	63.2%	0-shot · standard	01 Apr 2026	Self-reported	Primary
39	Claude Sonnet 3.7 (Thinking)	62.3%	—	24 Feb 2025	Self-reported	Primary
40	Claude Sonnet 3.7	62.3%	—	24 Feb 2025	Self-reported	Primary
41	Gemini 2.5 Flash (Thinking)	60.4%	—	17 Dec 2025	Self-reported	Primary
42	Gemini 2.5 Pro (Thinking)	59.6%	—	17 Dec 2025	Self-reported	Primary
43	GPT 5.4	57.7%	—	05 Mar 2026	Self-reported	Primary
44	GPT 5.3 Codex	56.8%	—	05 Mar 2026	Self-reported	Primary
45	GPT 4.1	55.0%	—	14 Apr 2025	Self-reported	Primary
46	GPT 5	52.8%	—	07 Aug 2025	Self-reported	Primary
47	DeepSeek-R1	49.2%	CoT	21 Jan 2025	Paper	Primary
48	o1	48.9%	—	16 Apr 2025	Self-reported	Primary
49	Nova Premier	42.4%	—	30 Apr 2025	Self-reported	Primary
50	DeepSeek V3	42.0%	—	26 Dec 2024	Paper	Primary
51	Claude Haiku 3.5	40.6%	—	22 Oct 2024	Self-reported	Primary
52	Nemotron 3	38.8%	standard	15 Dec 2025	Self-reported	Primary
53	Gemini 2.5 Flash-Lite	31.6%	—	26 Sep 2025	Self-reported	Primary

Go to section

Search

SWE-bench Verified

Best results

Frontier over time

All results

Help

People also viewed

Create AI Tools

Mini Tool

Vibe code an AI Tool

Choose listing type: