Situational awareness

Me, Myself, and AI: The Situational Awareness Dataset (SAD) for LLMs

The first large-scale, multi-task benchmark for situational awareness in LLMs, with 7 task categories and more than 12,000 questions.

Read More →

Taken out of context: On measuring situational awareness in LLMs

Situational awareness may emerge unexpectedly as a byproduct of model scaling. We propose 'out-of-context reasoning' as a way to measure this.

Read More →