Home › Knowledge Base › 410M

410M

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

Welfare, Improvability, and Variance: A Principal-Agent Approach to Optimal Benchmark Item Aggregation

arXiv:2605.30916v1 Announce Type: new Abstract: AI benchmarks have well-documented limitations, with prior work examining contamination, saturation, and construct underspecification. Aggregation has received far less attention: benchmarks are typically summarized by uniformly averaging item-level scores, implicitly treating every test item as equally valuable. We model benchmarking as a multitask principal-agent game and show that the welfare loss from a benchmark is determined jointly by...

arXiv CS 9d ago

A Negative Result on Cross-Model Activation Transfer in a Pythia Multi-Hop Setting

Announce Type: new Abstract: Recent work shows that language models can transmit behavioural traits through hidden signals in generated data during training. We ask whether a more direct and stricter channel is also viable: can one language model communicate useful intermediate reasoning state to another at inference time by translating and injecting hidden activations, rather than by passing natural-language text?

arXiv CS 7d ago

A Negative Result on Cross-Model Activation Transfer in a Pythia Multi-Hop Setting

arXiv:2606.03280v2 Announce Type: replace Abstract: Recent work shows that language models can transmit behavioural traits through hidden signals in generated data during training. We ask whether a different activation-mediated channel is viable: can one language model communicate a useful intermediate reasoning state to another at inference time through a post-hoc linear activation bridge, rather than through a textual or structured-token relay?

arXiv CS 2d ago

Spectral Probe-Circuits: A Three-Step Recipe for Identifying Attention-Head Circuits in Pretrained Transformers

arXiv:2605.24059v2 Announce Type: replace Abstract: We present a three-step recipe for identifying attention-head circuits in pretrained transformers. A per-head spectral signal -- the time-integrated participation ratio of each head's attention output -- ranks heads doing sustained content-dependent computation without labels or attribution gradients. A task-pattern screen filters this general indicator into a task-specific candidate circuit, and group ablation against a matched-random...

arXiv CS 5d ago