Sparse Crosscoders for Cross-Layer Features and Model Diffing
Lindsey, Templeton, Marcus et al. (Anthropic) (2024)
Tags: crosscoder, anthropic
Abstract
We introduce crosscoders, which learn shared features across multiple layers of a model, enabling cross-layer analysis and model comparison.