Sparse Crosscoders for Cross-Layer Features and Model Diffing

Lindsey, Templeton, Marcus et al. (Anthropic) (2024)

Read paper

Tags: crosscoder, anthropic

Abstract

We introduce crosscoders, which learn shared features across multiple layers of a model, enabling cross-layer analysis and model comparison.