Sponsor Flow - 4x faster than typing 🎤 Dictation

524,496 searches today

The front page of AI.Used by 90M+ humans.

Generate images Create AI Tools

YouTube Submit AI School Companionship SEO Summaries Chatbots Music Funny

Free mode

About Free mode

100% free

Freemium

Free Trial

Prompts Deals

BIG-Bench Hard

The 23-task subset of BIG-Bench where prior LLMs underperformed humans. Mix of logical, algorithmic, and language-understanding tasks.

Reasoning Text Accuracy Max 100.0% Released Oct 2022

Homepage Paper Code

6

Results

6

Models scored

91.6%

Top: Seed 1.5

84.6%

Median

Best results

Top primary scores; one row per model.

1

91.6%

2

86.9%

3

86.8%

4

82.4%

5

79.5%

6

51.2%

Frontier over time

Each dot is one model result; the line traces the running best score.

All results

Showing one canonical row per model. Show all configurations

#	Model	Score	Conditions	Eval date	Source	Flags
1	Seed 1.5	91.6%	—	22 Jan 2025	Self-reported	Primary
2	Nova Pro	86.9%	3-shot · CoT	03 Dec 2024	Self-reported	Primary
3	Claude Opus 3	86.8%	3-shot · CoT	22 Oct 2024	Self-reported	Primary
4	Nova Lite	82.4%	3-shot · CoT	03 Dec 2024	Self-reported	Primary
5	Nova Micro	79.5%	3-shot · CoT	03 Dec 2024	Self-reported	Primary
6	LLaMA 2	51.2%	3-shot	19 Jul 2023	Paper	Primary Verified

✕

0 AIs selected

Clear selection

#

Name

Task