MathVista
MathVista (testmini)
Mathematical reasoning over visual contexts: figures, charts, diagrams, geometric drawings.
Multimodal
Multimodal
Accuracy
Max 100.0%
Released Oct 2023
8
Results
8
Models scored
86.8%
Top: o3
71.9%
Median
Best results
Top primary scores; one row per model.
Frontier over time
Each dot is one model result; the line traces the running best score.
All results
Showing all configurations including non-primary alternates. · Show only primary
| # | Model | Score | Conditions | Eval date | Source | Flags |
|---|---|---|---|---|---|---|
| 1 | o3 | 86.8% | — | 16 Apr 2025 | Self-reported | Primary |
| 2 | o4 mini | 84.3% | — | 16 Apr 2025 | Self-reported | Primary |
| 3 | Llama 4 Maverick | 73.7% | — | 05 Apr 2025 | Self-reported | Primary |
| 4 | GPT 4.1 | 72.0% | — | 14 Apr 2025 | Self-reported | Primary |
| 5 | o1 | 71.8% | — | 16 Apr 2025 | Self-reported | Primary |
| 6 | Llama 4 Scout | 70.7% | — | 05 Apr 2025 | Self-reported | Primary |
| 7 | Claude Sonnet 3.5 | 67.7% | 0-shot · standard | 20 Jun 2024 | Self-reported | |
| 8 | Gemini 1.5 Pro | 63.9% | 0-shot · standard | 01 May 2024 | Self-reported | |
| 9 | GPT-4o | 63.8% | — | 16 Apr 2025 | Self-reported | Primary |
| 10 | Pixtral 12B | 58.3% | CoT | 10 Oct 2024 | Self-reported | Primary |
| 11 | Gemini Ultra | 53.0% | 0-shot · standard | 06 Dec 2023 | Self-reported | |
| 12 | Claude Haiku 3 | 46.4% | 0-shot · standard | 04 Mar 2024 | Self-reported |
