Home Knowledge Base BudgetFormer

BudgetFormer

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

Adaptive Head Budgeting for Efficient Multi-Head Attention

Announce Type: replace Abstract: Multi-head attention enables Transformers to capture diverse representations, but all attention heads are typically activated for every input, regardless of task complexity. For coarse-grained tasks such as text classification, where relevant information is often global, this fixed allocation can introduce unnecessary computation. We propose BudgetFormer, a Transformer architecture that dynamically allocates attention heads on a per-input basis.

arXiv CS 5d ago

Adaptive Head Budgeting for Efficient Multi-Head Attention

arXiv:2604.22583v2 Announce Type: replace Abstract: Multi-head attention enables Transformers to capture diverse representations, but all attention heads are typically activated for every input, regardless of task complexity. For coarse-grained tasks such as text classification, where relevant information is often global, this fixed allocation can introduce unnecessary computation. We propose BudgetFormer, a Transformer architecture that dynamically allocates attention heads on a per-input...

arXiv CS 6d ago