Home Knowledge Base INNSteer

INNSteer

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

Beyond Linear Activation Steering: Invertible Latent Transformations for Controlling LLM Behavior

Announce Type: new Abstract: Activation steering provides a lightweight inference-time mechanism for controlling large language models (LLMs) by modifying their internal activation vectors toward desired behaviors. Most existing methods compute a fixed steering direction in the original activation space, typically from pairs of contrastive examples using mean differences, linear probes, or arbitrary separability criteria. While effective to a certain extent, these methods treat behavioral...

arXiv CS 1d ago