Sparse Autoencoder

A portal dedicated to sparse autoencoders in mechanistic interpretability.

Latest Posts

No posts yet.

Latest Papers

A Survey on Sparse Autoencoders: Interpreting the Internal Mechanisms of Large Language Models

Shu, Wu, Zhao, Rai, Yao, Liu, Du (2025)

Enhancing Automated Interpretability with Output-Centric Feature Descriptions

Gould et al. (2025)

From Mechanistic Interpretability to Mechanistic Biology: SAEs on Protein Language Models

Adams, Bai, Lee, Yu, AlQuraishi (2025)

Gemma Scope 2 (SAEs on Gemma 3 family)

DeepMind (2025)

Interpretability Illusions with Sparse Autoencoders

Anonymous (2025)