Transcoders Beat Sparse Autoencoders for Interpretability

Paulo, Mallen, Belrose (2025)

Tags: transcoder, evaluation

Abstract

We compare transcoders and SAEs for interpretability, finding that transcoders produce more interpretable and faithful explanations of model behavior.