A Survey on Sparse Autoencoders: Interpreting the Internal Mechanisms of Large Language Models

Shu, Wu, Zhao, Rai, Yao, Liu, Du (2025)

Tags: survey

Abstract

A comprehensive survey covering sparse autoencoder methods for LLM interpretability, including architectures, training methods, evaluation approaches, and applications. Published at EMNLP Findings 2025.