Sparse Autoencoders Trained on the Same Data Learn Different Features
Paulo & Belrose (2025)
Tags: evaluation, reproducibility
Abstract
We show that SAEs trained on the same data with different random seeds learn substantially different features, raising questions about the uniqueness of SAE representations.