BatchTopK Sparse Autoencoders

Bussmann, Leask, Nanda (2024)

Read paper

Tags: architecture, topk

Abstract

We introduce BatchTopK, a simple activation function for SAEs that selects the top-k activations across the entire batch, achieving strong performance with minimal hyperparameter tuning.