TAAFT
Free mode
100% free
Freemium
Free Trial
Deals

HumanEval

HumanEval (pass@1)

OpenAI's 164 hand-written Python programming problems with unit tests. The original code-LLM benchmark; now saturated and broadly considered contaminated in modern training corpora.

Coding Text Pass@k Max 100.0% Released Jul 2021 Saturated Possibly contaminated
14
Results
13
Models scored
90.2%
Top: GPT-4o
72.6%
Median

Best results

Top primary scores; one row per model.

Frontier over time

Each dot is one model result; the line traces the running best score.
Best score over time0.0025.050.075.0100.0Dec 0025Feb 1026Apr 2026

All results

Showing all configurations including non-primary alternates.  · Show only primary
# Model Score Conditions Eval date Source Flags
1 Claude Sonnet 3.5 92.0% 0-shot · standard 20 Jun 2024 Self-reported
2 GPT-4o 90.2% 16 Apr 2025 Self-reported Primary
3 Llama 3.3 88.4% 0-shot · Pass@1 06 Dec 2024 Self-reported Primary
4 Claude Haiku 3.5 88.1% 0-shot 22 Oct 2024 Self-reported Primary
5 Claude Opus 3 84.9% 0-shot 22 Oct 2024 Self-reported Primary
6 Mistral Small 3 84.8% Pass@1 30 Dec 0025 Self-reported Primary
7 Gemini 1.5 84.1% 0-shot · standard 01 May 2024 Self-reported
8 Nemotron 3 Super 79.4% 0-shot · pass@1 n=32 03 Apr 2026 Self-reported Primary
9 Claude Haiku 3 75.9% 0-shot · standard 04 Mar 2024 Self-reported
10 Gemini Ultra 74.4% 0-shot · standard 06 Dec 2023 Self-reported
11 Gemini 1.5 Flash 74.3% 0-shot · standard 01 May 2024 Self-reported
12 WizardCoder 73.2% 01 Aug 2023 Paper Primary
13 Pixtral 12B 72.0% Pass@1 10 Oct 2024 Self-reported Primary
14 Claude 2 71.2% 0-shot · standard 11 Jul 2023 Self-reported
15 Code Llama 67.8% 01 Aug 2023 Paper Primary
16 GPT 3.5 48.1% 0-shot · standard 14 Mar 2023 Self-reported
17 Mixtral 8x7B 40.2% 01 Dec 2023 Paper Primary
18 Mixtral 8x7B 40.2% 08 Jan 2024 Self-reported Primary
19 Mistral 7B 30.5% 01 Sep 2023 Paper Primary
20 LLaMA 2 29.9% 0-shot 19 Jul 2023 Paper Primary Verified
21 LLaMA 2 70B 29.9% 0-shot 11 Jul 2023 Paper
22 Gemma 2 17.7% Pass@1 25 Feb 2025 Self-reported Primary
0 AIs selected
Clear selection
#
Name
Task