Evaluating Open-Source Sparse Autoencoders on Disentangling Factual Knowledge in GPT-2 Small
Chaudhary & Geiger (2024)
evaluation, factual-knowledge
Chaudhary & Geiger (2024)
evaluation, factual-knowledge
Lieberum, Rajamanoharan, Conmy et al. (DeepMind) (2024)
open-source, tooling, deepmind, gemma
Braun, Taylor, Goldowsky-Dill, Sharkey (2024)
training-efficiency, end-to-end
Rajamanoharan, Conmy, Smith et al. (DeepMind) (2024)
architecture, gated-sae, deepmind
Chalnev, Siu, Conmy (2024)
steering, applications
Simon & Zou (2024)
beyond-text, protein, biology
Rajamanoharan, Lieberum, Sonnerat et al. (DeepMind) (2024)
architecture, jumprelu, deepmind
He et al. (2024)
open-source, tooling, llama
Karvonen et al. (2024)
evaluation, benchmarks
Engels, Liao, Michaud, Gurnee, Tegmark (2024)
representation-geometry, non-linear
Bloom, Tigges, Chanin (2024)
open-source, tooling, library
Choi, Huang, Meng et al. (Transluce) (2024)
automated-interpretability, scaling
Templeton, Conerly, Marcus et al. (Anthropic) (2024)
foundational, scaling, anthropic
Gao, Dupré la Tour, Tillman et al. (OpenAI) (2024)
foundational, scaling, evaluation, openai
Bussmann, Pearce, Leask, Bloom, Sharkey, Nanda (2024)
representation-geometry, meta-sae
Lindsey, Templeton, Marcus et al. (Anthropic) (2024)
crosscoder, anthropic
Marks, Rager, Michaud, Belinkov, Bau, Mueller (2024)
circuits, causal-analysis, applications
Makelov, Lange, Nanda (2024)
evaluation, benchmarks
Dunefsky, Chlenski, Nanda (2024)
transcoder, circuits
Bills, Cammarata, Mossing et al. (OpenAI) (2023)
automated-interpretability, openai