Pythia-160M to Pythia-410M
No mentions found
This entity hasn't been tracked yet, or Iris is still building its knowledge base.
Related Articles from SNS
A Negative Result on Cross-Model Activation Transfer in a Pythia Multi-Hop Setting
Announce Type: new Abstract: Recent work shows that language models can transmit behavioural traits through hidden signals in generated data during training. We ask whether a more direct and stricter channel is also viable: can one language model communicate useful intermediate reasoning state to another at inference time by translating and injecting hidden activations, rather than by passing natural-language text?
A Negative Result on Cross-Model Activation Transfer in a Pythia Multi-Hop Setting
arXiv:2606.03280v2 Announce Type: replace Abstract: Recent work shows that language models can transmit behavioural traits through hidden signals in generated data during training. We ask whether a different activation-mediated channel is viable: can one language model communicate a useful intermediate reasoning state to another at inference time through a post-hoc linear activation bridge, rather than through a textual or structured-token relay?