Blog

Our research findings that are not published as a paper. These are shorter research updates, or quick followups on existing papers.

Concept Poisoning: Probing LLMs without probes

A novel LLM evaluation technique using concept poisoning to probe models without explicit probes

Read More →

Backdoor awareness and misaligned personas in reasoning models

Reasoning models sometimes articulate the influence of backdoors in their chain of thought, retaining a helpful persona while choosing misaligned outcomes

Read More →

OpenAI Responses API changes models' behavior

OpenAI's new Responses API causes finetuned models to behave differently than the Chat Completions API, sometimes dramatically so.

Read More →

New, improved multiple-choice TruthfulQA

We introduce a new multiple-choice version of TruthfulQA that fixes a potential problem with the existing versions (MC1 and MC2).

Read More →

Tips On Empirical Research Slides

Practical tips on slide-based communication for empirical research with LLMs

Read More →