Evaluation

Concept Poisoning: Probing LLMs without probes

A novel LLM evaluation technique using concept poisoning to probe models without explicit probes

Read More →

New, improved multiple-choice TruthfulQA

We introduce a new multiple-choice version of TruthfulQA that fixes a potential problem with the existing versions (MC1 and MC2).

Read More →

Me, Myself, and AI: The Situational Awareness Dataset (SAD) for LLMs

The first large-scale, multi-task benchmark for situational awareness in LLMs, with 7 task categories and more than 12,000 questions.

Read More →

TruthfulQA: Measuring how models mimic human falsehoods

We propose a benchmark to measure whether a language model is truthful in generating answers to questions.

Read More →