Open Problems in Mechanistic Interpretability

Sharkey, Chughtai, Batson et al. (2025)

Tags: critical-perspectives, open-problems

Abstract

We outline key open problems in mechanistic interpretability, including challenges related to sparse autoencoders, feature universality, and scaling interpretability methods.