Scaling Automatic Neuron Description
Choi, Huang, Meng et al. (Transluce) (2024)
Tags: automated-interpretability, scaling
Abstract
We scale automatic neuron description methods to millions of features, improving the quality and efficiency of automated interpretability.