Scaling and Evaluating Sparse Autoencoders
Gao, Dupré la Tour, Tillman et al. (OpenAI) (2024)
Tags: foundational, scaling, evaluation, openai
Abstract
We train sparse autoencoders on GPT-4 activations with up to 16 million features, developing new evaluation methods and scaling laws for SAE training.