Home › Knowledge Base › the Residual Stream

the Residual Stream

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

Do Value Vectors in Deep Layers Need Context from the Residual Stream?

Announce Type: new Abstract: The success of the transformer architecture as the backbone of modern LLMs is in large part due to its use of attention layers. An attention layer follows the standard neural network paradigm: it takes the residual stream as input and thereby produces context-dependent query, key, and value vectors. However, we find that model performance meaningfully improves when deeper layers learn only a context-free value vector to preserve the original token information,...

arXiv CS 7d ago

Do Value Vectors in Deep Layers Need Context from the Residual Stream?

Announce Type: replace Abstract: The success of the transformer architecture as the backbone of modern LLMs is in large part due to its use of attention layers. An attention layer follows the standard neural network paradigm: it takes the residual stream as input and thereby produces context-dependent query, key, and value vectors. However, we find that model performance meaningfully improves when deeper layers learn only a context-free value vector to preserve the original token...

arXiv CS 1d ago

Analyzing Stream Collapse in Hyper-Connections: From Diagnosis to Mitigation

arXiv:2606.03483v1 Announce Type: new Abstract: Hyper-Connections (HC) replace the single Transformer residual stream with multiple streams, introducing a permutation symmetry over stream indices. We study how this symmetry is resolved in practice: whether streams specialize in a balanced way or exhibit dominant-stream usage. Using fine-grained diagnostics for HC-based language models, we trace how multi-stream representations are actually used.

arXiv CS 7d ago

WAV: Multi-Resolution Block Residual Routing for Deep Decoder-Only Transformers

Announce Type: new Abstract: Residual connections are central to training deep Transformers, but standard PreNorm residual streams aggregate sublayer updates with fixed unit weights. Recent Attention Residuals replace this fixed accumulation with content-dependent depth-wise routing, and Block Attention Residuals make the mechanism efficient by routing over block-level residual summaries. However, a single block summary stores only the low-frequency total residual displacement inside a...

arXiv CS 2d ago

Where does Absolute Position come from in decoder-only Transformers?

arXiv:2606.06160v1 Announce Type: new Abstract: RoPE-trained transformers distinguish absolute position in their attention patterns, even though RoPE encodes only relative offsets in the inner product. We trace this leakage to two architectural components, The causal mask is responsible for the first: its per-query softmax denominator depends on the absolute query position by construction. The residual stream supplies the second.

arXiv CS 5d ago

Hallucinations as Orthogonal Noise: Inference-Time Manifold Alignment via Dynamic Contextual Orthogonalization

Announce Type: new Abstract: Hallucination in Large Language Models (LLMs), characterized by the generation of content inconsistent with contextual facts or logical constraints -- remains a persistent challenge for reliable deployment. In this work, we address this issue through a geometric framework rooted in the linear representation hypothesis. We propose that hallucinations manifest as orthogonal noise relative to the semantic manifold of the residual stream.

arXiv CS 7d ago

Building Better Activation Oracles

arXiv:2606.02609v2 Announce Type: replace Abstract: Activation Oracles (AOs) are promising methods for interpreting residual stream activations. However, current AOs face important issues, such as hallucinations and vagueness. Additionally, text-inversion confounds make them hard to evaluate.

arXiv CS 2d ago

Building Better Activation Oracles

Announce Type: new Abstract: Activation Oracles (AOs) are promising methods for interpreting residual stream activations. However, current AOs face important issues, such as hallucinations and vagueness. Additionally, text-inversion confounds make them hard to evaluate.

arXiv CS 7d ago

TALAN: Task-Aligned Latent Adaptation Networks for Targeted Post-Training of Large Language Models

Announce Type: new Abstract: Targeted post-training aims to improve reasoning, math, and code without degrading strengths. Low-rank adapters are efficient but task-global; activation interventions are input-aware but often require separate probes, vectors, or inference-time steering. We introduce TALAN (Task-Aligned Latent Adaptation Networks), a sequence-conditioned latent side path inserted into a transformer's residual stream and co-trained with a low-rank adapter in one SFT loop.

arXiv CS 2d ago

How Context Shapes Truth: Geometric Transformations of Statement-level Truth Representations in LLMs

arXiv:2601.06599v2 Announce Type: replace Abstract: Large Language Models (LLMs) often encode whether a statement is true as a vector in their residual stream activations. These vectors, also known as truth vectors, have been studied in prior work, however how they change when context is introduced remains unexplored.

arXiv CS 1d ago