IFBench
Measures how reliably a model follows complex multi-constraint instructions, a known weak spot for many otherwise strong models.
Best results
Frontier over time
All results
| # | Model | Score | Conditions | Eval date | Source | Flags |
|---|---|---|---|---|---|---|
| 1 | Qwen 3.7 Max | 79.1% | 0-shot · CoT · standard | 20 May 2026 | Self-reported | |
| 2 | Qwen 3.5 122B A10B | 76.1% | — | 24 Apr 2026 | Third-party | Primary Verified |
| 3 | Qwen 3.5 35B A3B | 70.2% | — | 15 Feb 2025 | Third-party | Primary Verified |
