Truthfulness

How to catch an AI liar: Lie detection in black-box LLMs by asking unrelated questions

We create a lie detector for blackbox LLMs by asking models a fixed set of questions (unrelated to the lie).

Read More →

Teaching Models to Express Their Uncertainty in Words

We show that a GPT-3 model can learn to express uncertainty about its own answers in natural language -- without use of model logits.

Read More →

TruthfulQA: Measuring how models mimic human falsehoods

We propose a benchmark to measure whether a language model is truthful in generating answers to questions.

Read More →