TruthfulQA
817 questions designed to elicit imitative falsehoods. Measures whether models repeat common misconceptions.
Knowledge
Text
Accuracy
Max 100.0%
Released Sep 2021
1
Results
1
Models scored
50.3%
Top: Mistral NeMo
50.3%
Median
Best results
Top primary scores; one row per model.
Frontier over time
Each dot is one model result; the line traces the running best score.
Not enough data to plot a trend yet.
All results
Showing all configurations including non-primary alternates. · Show only primary
| # | Model | Score | Conditions | Eval date | Source | Flags |
|---|---|---|---|---|---|---|
| 1 | Mistral NeMo | 50.3% | 0-shot | 18 Jul 2024 | Self-reported | Primary |
