Llama Scope: Extracting Millions of Features from Llama-3.1-8B with Sparse Autoencoders
He et al. (2024)
Tags: open-source, tooling, llama
Abstract
We train and release sparse autoencoders on Llama-3.1-8B, extracting millions of interpretable features across all layers of the model.