Toy Models of Superposition

Elhage, Hume, Olsson et al. (Anthropic) (2022)

Tags: foundations, superposition, anthropic

Abstract

We use toy models to investigate superposition, a phenomenon where neural networks represent more features than they have dimensions by encoding features in overlapping combinations of neurons.