Transcoders Beat Sparse Autoencoders for Interpretability
Paulo, Mallen, Belrose (2025)
Tags: transcoder, evaluation
Abstract
We compare transcoders and SAEs for interpretability, finding that transcoders produce more interpretable and faithful explanations of model behavior.