TAAFT
Free mode
100% free
Freemium
Free Trial
Deals

IFBench

Instruction Following Benchmark

Measures how reliably a model follows complex multi-constraint instructions, a known weak spot for many otherwise strong models.

Language Text Accuracy Max 100.0% Released Jun 2025
2
Results
2
Models scored
76.1%
Top: Qwen 3.5 122B A10B
73.2%
Median

Best results

Top primary scores; one row per model.

Frontier over time

Each dot is one model result; the line traces the running best score.
Best score over time0.0025.050.075.0100.0Feb 2025Sep 2025Apr 2026

All results

Showing all configurations including non-primary alternates.  · Show only primary
# Model Score Conditions Eval date Source Flags
1 Qwen 3.7 Max 79.1% 0-shot · CoT · standard 20 May 2026 Self-reported
2 Qwen 3.5 122B A10B 76.1% 24 Apr 2026 Third-party Primary Verified
3 Qwen 3.5 35B A3B 70.2% 15 Feb 2025 Third-party Primary Verified
0 AIs selected
Clear selection
#
Name
Task