Open Problems in Mechanistic Interpretability
Sharkey, Chughtai, Batson et al. (2025)
Tags: critical-perspectives, open-problems
Abstract
We outline key open problems in mechanistic interpretability, including challenges related to sparse autoencoders, feature universality, and scaling interpretability methods.