Llama Scope: Extracting Millions of Features from Llama-3.1-8B with Sparse Autoencoders

He et al. (2024)

Tags: open-source, tooling, llama

Abstract

We train and release sparse autoencoders on Llama-3.1-8B, extracting millions of interpretable features across all layers of the model.