Home Knowledge Base AttnRes

AttnRes

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

DeRes: Decoupling Residual Stability and Adaptivity for Scalable CTR Prediction

arXiv:2606.07980v1 Announce Type: new Abstract: Transformer-based CTR models face a growing bottleneck at the residual connection: under Pre-Norm, early user-interest signals are diluted layer by layer; the identity skip cannot forget stale interests; and each layer sees only its immediate predecessor, losing long-range cross-layer dependencies. Recent attention-based residual variants (AttnRes) address parts of this in language models, but drop the protective identity skip and have not been...

arXiv CS 1d ago

WAV: Multi-Resolution Block Residual Routing for Deep Decoder-Only Transformers

Announce Type: new Abstract: Residual connections are central to training deep Transformers, but standard PreNorm residual streams aggregate sublayer updates with fixed unit weights. Recent Attention Residuals replace this fixed accumulation with content-dependent depth-wise routing, and Block Attention Residuals make the mechanism efficient by routing over block-level residual summaries. However, a single block summary stores only the low-frequency total residual displacement inside a...

arXiv CS 2d ago