TruthfulQA

817 questions designed to elicit imitative falsehoods. Measures whether models repeat common misconceptions.

Knowledge Text Accuracy Max 100.0% Released Sep 2021

Results

Models scored

50.3%

Top: Mistral NeMo

50.3%

Median

Best results

Top primary scores; one row per model.

50.3%

Each dot is one model result; the line traces the running best score.

Not enough data to plot a trend yet.

Showing all configurations including non-primary alternates. · Show only primary

#	Model	Score	Conditions	Eval date	Source	Flags
1	Mistral NeMo	50.3%	0-shot	18 Jul 2024	Self-reported	Primary