Closure-Validated Circuit Discovery in Attention
No mentions found
This entity hasn't been tracked yet, or Iris is still building its knowledge base.
Related Articles from SNS
Closure-Validated Circuit Discovery in Attention Heads: Co-activation Proposes, Ablation Disposes
arXiv:2606.09607v1 Announce Type: new Abstract: Interpretability increasingly treats groups of components, not individual units, as the basic object, and proposes to find them by clustering co-activation statistics. We ask whether such a cheap signal actually identifies an attention-head circuit. Adapting a sparse-autoencoder clustering recipe to attention heads -- but validating by causal ablation rather than reconstruction -- we cluster heads and then run a closure test: ablate the...