TAAFT
Free mode
100% free
Freemium
Free Trial
Deals

SWE-bench Verified

500 manually validated GitHub issues from popular Python repos. Models must produce a patch that passes the hidden test suite. The current standard for "real software engineering" capability.

Coding Text accuracy Max 100.0% Released Aug 2024
49
Results
48
Models scored
87.6%
Top: Claude Opus 4.7
72.2%
Median

Best results

Top primary scores; one row per model.

Frontier over time

Each dot is one model result; the line traces the running best score.
Best score over time0.0025.050.075.0100.0Oct 2024Jul 2025Apr 2026

All results

Showing one canonical row per model. Show all configurations
# Model Score Conditions Eval date Source Flags
1 Claude Opus 4.7 87.6% CoT Apr 16, 2026 self reported primary
2 Claude Opus 4.5 80.9% Nov 24, 2025 self reported primary
3 Claude Opus 4.6 80.8% Feb 5, 2026 self reported primary
4 Gemini 3.1 Pro 80.6% CoT Feb 19, 2026 self reported primary
5 MiniMax M2.5 80.2% 0-shot · CoT · agentic, avg@4 Feb 12, 2026 self reported primary
6 Kimi K2.6 80.2% CoT Apr 20, 2026 self reported primary
7 GPT 5.2 Thinking 80.0% CoT Dec 11, 2025 self reported primary
8 Claude Sonnet 4.6 79.6% Feb 17, 2026 self reported primary
9 Gemini 3 Flash 78.0% CoT Dec 17, 2025 self reported primary
10 Gemini 3 Flash (Thinking) 78.0% Dec 17, 2025 self reported primary
11 GLM 5 77.8% CoT Feb 12, 2026 self reported primary
12 Mistral Medium 3.5 77.6% Apr 27, 2026 self reported primary
13 Muse Spark 77.4% CoT Apr 8, 2026 self reported primary
14 Claude Sonnet 4.5 77.2% CoT Sep 29, 2025 self reported primary
15 Kimi K2.5 76.8% CoT Jan 27, 2026 self reported primary
16 Gemini 3 Pro 76.2% CoT Nov 18, 2025 self reported primary
17 GPT 5.1 74.9% 0-shot · CoT Nov 13, 2025 self reported primary
18 GPT 5 (Thinking) 74.9% Aug 7, 2025 self reported primary
19 Opus 4.1 Thinking 74.5% CoT Aug 5, 2025 self reported primary
20 Claude Haiku 4.5 73.3% 0-shot · CoT Oct 15, 2025 self reported primary
21 Claude Haiku 4.5 73.3% Oct 15, 2025 self reported primary
22 Deepseek 3.2 73.1% Dec 1, 2025 paper primary verified
23 Claude Sonnet 4 72.7% May 22, 2025 self reported primary
24 Qwen 3.5 27B 72.4% Feb 24, 2026 third party primary verified
25 Devstral 2 72.2% 0-shot Dec 9, 2025 self reported primary
26 Qwen 3.5 122B A10B 72.0% Feb 24, 2026 third party primary verified
27 Grok Code Fast 1 70.8% CoT Jul 9, 2025 self reported primary
28 Qwen 3.5 35B A3B 69.2% Feb 24, 2026 third party primary verified
29 o3 69.1% Apr 16, 2025 self reported primary
30 o4 mini 68.1% Apr 16, 2025 self reported primary
31 GLM 4.6 68.0% CoT Sep 30, 2025 self reported primary
32 DeepSeek V3.2 Exp 67.8% CoT Sep 29, 2025 self reported primary
33 Qwen3 Coder 67.0% Jul 22, 2025 self reported primary
34 Kimi K2 Instruct 65.8% Jul 20, 2025 paper primary
35 Gemini 2.5 Pro 63.8% CoT Mar 25, 2025 self reported primary
36 Claude Sonnet 3.7 (Thinking) 62.3% Feb 24, 2025 self reported primary
37 Claude Sonnet 3.7 62.3% Feb 24, 2025 self reported primary
38 Gemini 2.5 Flash (Thinking) 60.4% Dec 17, 2025 self reported primary
39 Gemini 2.5 Pro (Thinking) 59.6% Dec 17, 2025 self reported primary
40 GPT 5.4 57.7% Mar 5, 2026 self reported primary
41 GPT 5.3 Codex 56.8% Mar 5, 2026 self reported primary
42 GPT 4.1 55.0% Apr 14, 2025 self reported primary
43 GPT 5 52.8% Aug 7, 2025 self reported primary
44 DeepSeek-R1 49.2% CoT Jan 21, 2025 paper primary
45 o1 48.9% Apr 16, 2025 self reported primary
46 Nova Premier 42.4% Apr 30, 2025 self reported primary
47 DeepSeek V3 42.0% Dec 26, 2024 paper primary
48 Claude Haiku 3.5 40.6% Oct 22, 2024 self reported primary
49 Gemini 2.5 Flash-Lite 31.6% Sep 26, 2025 self reported primary
0 AIs selected
Clear selection
#
Name
Task