Rylan Schaeffer

Resume
Publications
Learning
Blog
Teaching
Jokes
Kernel Papers

Incidental Polysemanticity: A New Obstacle for Mechanistic Interpretability

Victor Lecomte, Kushal Thaman, Rylan Schaeffer, Naomi Bashkansky, Trevor Chow, Sanmi Koyejo

arXiv preprint Under Review

December 2024

Mechanistic Interpretability Polysemanticity Neural Networks AI Safety

Summary

Incidental polysemanticity poses challenges for mechanistic interpretability.