BatchTopK Sparse Autoencoders
Bussmann, Leask, Nanda (2024)
Tags: architecture, topk
Abstract
We introduce BatchTopK, a simple activation function for SAEs that selects the top-k activations across the entire batch, achieving strong performance with minimal hyperparameter tuning.