Efficient Training of Sparse Autoencoders for Large Language Models via Layer Groups
Ghilardi, Belotti, Molinari (2024)
Tags: training-efficiency, scaling
Abstract
We propose training SAEs on groups of layers simultaneously, reducing computational costs while maintaining feature quality.