TAAFT
Free mode
100% free
Freemium
Free Trial
Deals

MMMU-Pro

Harder MMMU variant: filters out text-only-solvable items and adds a vision-only setting where the question itself is rendered into the image.

Multimodal Multimodal Accuracy Max 100.0% Released Sep 2024
14
Results
14
Models scored
81.2%
Top: GPT 5.4
76.5%
Median

Best results

Top primary scores; one row per model.

Frontier over time

Each dot is one model result; the line traces the running best score.
Best score over time0.0025.050.075.0100.0Jan 2025Sep 2025May 2026

All results

Showing one canonical row per model. Show all configurations
# Model Score Conditions Eval date Source Flags
1 GPT 5.4 81.2% 05 Mar 2026 Self-reported Primary
2 Gemini 3 Flash (Thinking) 81.2% 17 Dec 2025 Self-reported Primary
3 Gemini 3 Pro 81.0% CoT 18 Nov 2025 Self-reported Primary
4 Gemini 3.1 Pro 80.5% CoT 19 Feb 2026 Self-reported Primary
5 Kimi K2.6 79.4% 20 Apr 2026 Self-reported Primary
6 GPT 5 (Thinking) 78.4% 07 Aug 2025 Self-reported Primary
7 Gemma 4 76.9% 03 Apr 2026 Self-reported Primary
8 GPT 5.5 Instant 76.0% 0-shot 05 May 2026 Self-reported Primary
9 Qwen 3.5 35B A3B 75.1% 15 Feb 2025 Third-party Primary Verified
10 Claude Sonnet 4.6 74.5% 17 Feb 2026 Self-reported Primary
11 Gemini 2.5 Pro (Thinking) 68.0% 17 Dec 2025 Self-reported Primary
12 Gemini 2.5 Flash (Thinking) 66.7% 17 Dec 2025 Self-reported Primary
13 GPT 5 62.7% 07 Aug 2025 Self-reported Primary
14 Seed 1.5 59.3% 22 Jan 2025 Self-reported Primary
0 AIs selected
Clear selection
#
Name
Task