Concept Poisoning: Probing LLMs without probes
A novel LLM evaluation technique using concept poisoning to probe models without explicit probes
Read More →Our research findings that are not published as a paper. These are shorter research updates, or quick followups on existing papers.
A novel LLM evaluation technique using concept poisoning to probe models without explicit probes
Read More →Reasoning models sometimes articulate the influence of backdoors in their chain of thought, retaining a helpful persona while choosing misaligned outcomes
Read More →OpenAI's new Responses API causes finetuned models to behave differently than the Chat Completions API, sometimes dramatically so.
Read More →We introduce a new multiple-choice version of TruthfulQA that fixes a potential problem with the existing versions (MC1 and MC2).
Read More →Practical tips on slide-based communication for empirical research with LLMs
Read More →