Sparse Autoencoders Trained on the Same Data Learn Different Features

Paulo & Belrose (2025)

Tags: evaluation, reproducibility

Abstract

We show that SAEs trained on the same data with different random seeds learn substantially different features, raising questions about the uniqueness of SAE representations.