Can Language Models Explain Their Own Classification Behavior?

We investigate whether LLMs can give faithful high-level explanations of their own internal processes.
Read More →
We investigate whether LLMs can give faithful high-level explanations of their own internal processes.
Read More →