IFEval

#	Model	Score	Conditions	Eval date	Source	Flags
1	Claude Sonnet 3.7 (Thinking)	93.2%	—	24 Feb 2025	Self-reported	Primary
2	Nova Pro	92.1%	0-shot	03 Dec 2024	Self-reported	Primary
3	Llama 3.3	92.1%	—	06 Dec 2024	Self-reported	Primary
4	Command A	90.9%	—	07 Apr 2025	Self-reported	Primary
5	Claude Sonnet 3.7	90.8%	—	24 Feb 2025	Self-reported	Primary
6	Nova Lite	89.7%	0-shot	03 Dec 2024	Self-reported	Primary
7	Seed 1.5	89.5%	0-shot · CoT	22 Jan 2025	Self-reported	Primary
8	Nova Micro	87.2%	0-shot	03 Dec 2024	Self-reported	Primary
9	GPT 4.1	87.0%	—	14 Apr 2025	Self-reported	Primary
10	Mistral Small 3	82.9%	—	30 Jan 2025	Self-reported	Primary
11	Llama 3.2	77.4%	—	25 Sep 2025	Self-reported	Primary
12	Qwen 3.5 27B	76.5%	—	24 Feb 2026	Third-party	Primary Verified

Go to section