Improving Steering Vectors by Targeting Sparse Autoencoder Features

Chalnev, Siu, Conmy (2024)

Read paper

Tags: steering, applications

Abstract

We improve steering vectors by using SAE features to create more targeted and effective interventions for controlling model behavior.