INNSteer
No mentions found
This entity hasn't been tracked yet, or Iris is still building its knowledge base.
Related Articles from SNS
Beyond Linear Activation Steering: Invertible Latent Transformations for Controlling LLM Behavior
Announce Type: new Abstract: Activation steering provides a lightweight inference-time mechanism for controlling large language models (LLMs) by modifying their internal activation vectors toward desired behaviors. Most existing methods compute a fixed steering direction in the original activation space, typically from pairs of contrastive examples using mean differences, linear probes, or arbitrary separability criteria. While effective to a certain extent, these methods treat behavioral...