A Survey on Sparse Autoencoders: Interpreting the Internal Mechanisms of Large Language Models
Shu, Wu, Zhao, Rai, Yao, Liu, Du (2025)
Tags: survey
Abstract
A comprehensive survey covering sparse autoencoder methods for LLM interpretability, including architectures, training methods, evaluation approaches, and applications. Published at EMNLP Findings 2025.