Home Knowledge Base Dynamic Heads

Dynamic Heads

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

Positional versus Symbolic Attention Heads: Learning Dynamics, RoPE Geometry, and Length Generalization

new Abstract: Transformer-based language models are widespread in today's society. As such, understanding the mechanisms by which they solve structured tasks and predicting how they may behave in novel scenarios is of great importance for safe deployment. We study the learning dynamics of attention heads in a controlled setting by training a decoder-only Transformer (GPT-J) on two structurally equivalent multi-hop reasoning tasks: a number task requiring positional reasoning and a letter...

arXiv CS 9d ago

From Rigid to Dynamic: Entropy-Guided Adaptive Inference for Long-Context LLMs

arXiv:2606.09508v1 Announce Type: new Abstract: Existing sparse attention and KV cache compression methods for long-context LLM inference typically apply fixed sparsity patterns or uniform budgets across all attention heads, overlooking the substantial variation in attention behavior among heads and contexts. We observe two distinct entropy patterns among attention heads: Rigid Heads, whose entropy stays near zero across input segments, and Dynamic Heads, whose entropy fluctuates...

arXiv CS 1d ago

Adaptive Head Budgeting for Efficient Multi-Head Attention

arXiv:2604.22583v2 Announce Type: replace Abstract: Multi-head attention enables Transformers to capture diverse representations, but all attention heads are typically activated for every input, regardless of task complexity. For coarse-grained tasks such as text classification, where relevant information is often global, this fixed allocation can introduce unnecessary computation. We propose BudgetFormer, a Transformer architecture that dynamically allocates attention heads on a per-input...

arXiv CS 6d ago

Adaptive Head Budgeting for Efficient Multi-Head Attention

Announce Type: replace Abstract: Multi-head attention enables Transformers to capture diverse representations, but all attention heads are typically activated for every input, regardless of task complexity. For coarse-grained tasks such as text classification, where relevant information is often global, this fixed allocation can introduce unnecessary computation. We propose BudgetFormer, a Transformer architecture that dynamically allocates attention heads on a per-input basis.

arXiv CS 5d ago

Hamiltonian-Inspired Attention Mechanism for Scalable RF Transmitter Fingerprinting

Announce Type: cross Abstract: Radio-frequency (RF) fingerprinting identifies wire-less transmitters using hardware-induced imperfections present in baseband I/Q signals. However, deep learning models often degrade under receiver and channel distribution shifts, particularly as transmitter populations grow. This work proposes the Hamiltonian Transformer, a physics-informed attention architecture that enforces norm preserving value dynamics within each attention head using a learned...

arXiv CS 9d ago

Representation over Routing: Diagnosing Temporal Routing Pathologies in Multi-Timescale PPO

Announce Type: replace Abstract: Temporal credit assignment in reinforcement learning is often approached by introducing value estimates at multiple discount factors. A natural next step is to let the actor dynamically route among these temporal heads, using either differentiable attention or heuristic uncertainty weights. This paper argues that such routing can create a numerical shortcut rather than a reliable temporal abstraction.

arXiv CS 8d ago

Echo: A Joint-Embedding Predictive Architecture for Speaker Diarization and Speech Recognition in a Shared Latent Space

Announce Type: new Abstract: We present Echo, a proof-of-concept audio system built around a single 25 M-parameter ViT encoder. The encoder is pretrained with a JEPA objective and then specialised by stages to carry speaker identity, phonetic content, and dynamic source routing in the same 512-dimensional latent space, with no per-task fine-tuning at deployment. Light heads handle diarization (ArcFace + VBx) and dynamic source separation (null-target K-set prediction).

arXiv CS 8d ago

Estimating Central, Peripheral, and Temporal Visual Contributions to Human Decision Making in Atari Games

Announce Type: replace Abstract: We study how different visual information sources contribute to human decision making in dynamic visual environments. Using Atari-HEAD, a large-scale Atari gameplay dataset with synchronized eye-tracking, we introduce a controlled ablation framework as a means to reverse-engineer the contribution of peripheral visual information, explicit gaze information in the form of gaze maps, and past-state information from human behavior. We train action-prediction...

arXiv CS 7d ago

Mamba-Enhanced Implicit Motion Learning for Audio-Driven Portrait Animation

Announce Type: new Abstract: Audio-driven human motion video generation aims to synthesize realistic and temporally coherent human animations from a single static image, with applications in talking-head synthesis, co-speech gesture generation, and dynamic presentations. Moving beyond conventional keypoint-based methods that often struggle to capture subtle motion dynamics, We propose a novel implicit-motion framework for generating realistic and temporally coherent human motion videos from...

arXiv CS 7d ago

Mamba-Enhanced Implicit Motion Learning for Audio-Driven Portrait Animation

Announce Type: replace Abstract: Audio-driven human motion video generation aims to synthesize realistic and temporally coherent human animations from a single static image, with applications in talking-head synthesis, co-speech gesture generation, and dynamic presentations. Moving beyond conventional keypoint-based methods that often struggle to capture subtle motion dynamics, We propose a novel implicit-motion framework for generating realistic and temporally coherent human motion videos...

arXiv CS 6d ago