Home Knowledge Base Closure-Validated Circuit Discovery in Attention

Closure-Validated Circuit Discovery in Attention

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

Closure-Validated Circuit Discovery in Attention Heads: Co-activation Proposes, Ablation Disposes

arXiv:2606.09607v1 Announce Type: new Abstract: Interpretability increasingly treats groups of components, not individual units, as the basic object, and proposes to find them by clustering co-activation statistics. We ask whether such a cheap signal actually identifies an attention-head circuit. Adapting a sparse-autoencoder clustering recipe to attention heads -- but validating by causal ablation rather than reconstruction -- we cluster heads and then run a closure test: ablate the...

arXiv CS 1d ago