Blog

Our research findings that are not published as a paper. These are shorter research updates, or quick followups on existing papers.

Aug 05, 2025

Concept Poisoning: Probing LLMs without probes

A novel LLM evaluation technique using concept poisoning to probe models without explicit probes

Jun 20, 2025

Backdoor awareness and misaligned personas in reasoning models

Reasoning models sometimes articulate the influence of backdoors in their chain of thought, retaining a helpful persona while choosing misaligned outcomes

Apr 11, 2025

OpenAI Responses API changes models' behavior

OpenAI's new Responses API causes finetuned models to behave differently than the Chat Completions API, sometimes dramatically so.

Jan 15, 2025

New, improved multiple-choice TruthfulQA

We introduce a new multiple-choice version of TruthfulQA that fixes a potential problem with the existing versions (MC1 and MC2).

Jan 08, 2025

Tips On Empirical Research Slides

Practical tips on slide-based communication for empirical research with LLMs