Improving Steering Vectors by Targeting Sparse Autoencoder Features
Chalnev, Siu, Conmy (2024)
Tags: steering, applications
Abstract
We improve steering vectors by using SAE features to create more targeted and effective interventions for controlling model behavior.