Attention Gates
No mentions found
This entity hasn't been tracked yet, or Iris is still building its knowledge base.
Related Articles from SNS
Frequency-Domain Latent Attention Gating for Cross-Domain Token Aggregation
arXiv:2606.08191v1 Announce Type: new Abstract: Token aggregation is a common bottleneck in models that map token representations to sample-level predictions, yet most pooling methods operate only in the original token domain. We propose FLaG, a plug-in aggregation module that transforms token representations with the real FFT, summarizes spectral components with learnable latent queries, applies a channel-wise gate, and reconstructs enhanced time-domain tokens for final pooling.
Gated Bidirectional Linear Attention for Generative Retrieval
arXiv:2606.07317v2 Announce Type: replace Abstract: In recommender systems, generative retrieval typically uses an encoder-decoder setup: an encoder processes a user interaction history, and an autoregressive decoder then generates recommended items. In large-scale streaming services, active users accumulate very long histories over time.
Gated Bidirectional Linear Attention for Generative Retrieval
arXiv:2606.07317v1 Announce Type: new Abstract: In recommender systems, generative retrieval typically uses an encoder-decoder setup: an encoder processes a user interaction history, and an autoregressive decoder then generates recommended items. In large-scale streaming services, active users accumulate very long histories over time.
A Method for Neutron-Gamma Pulse Shape Discrimination of CLYC Detector Based on a Gated Residual-Linear Attention Network
arXiv:2606.02613v1 Announce Type: new Abstract: The discrimination of neutron and gamma pulse shapes is a key technology in fields such as nuclear safety monitoring and radiation assessment. An enhanced recursive gated cyclic residual-sparse linear attention network is developed on the CLYC detector experimental platform to overcome weak noise resistance, limited feature extraction and inferior real-time performance of conventional algorithms. The experimental dataset comprises 19,971...
Learning to Remember, Learn, and Forget in Attention-Based Models
Announce Type: replace Abstract: In-Context Learning (ICL) in transformers acts as an online associative memory and is believed to underpin their high performance on complex sequence processing tasks. However, in gated linear attention models, this memory has a fixed capacity and is prone to interference, especially for long sequences. We propose Palimpsa, a self-attention model that views ICL as a continual learning problem that must address a stability-plasticity dilemma.
Learning to Remember, Learn, and Forget in Attention-Based Models
Announce Type: replace Abstract: In-Context Learning (ICL) in transformers acts as an online associative memory and is believed to underpin their high performance on complex sequence processing tasks. However, in gated linear attention models, this memory has a fixed capacity and is prone to interference, especially for long sequences. We propose Palimpsa, a self-attention model that views ICL as a continual learning problem that must address a stability-plasticity dilemma.
Enhanced Detection of Tiny Objects in Aerial Images
arXiv:2509.17078v3 Announce Type: replace Abstract: While one-stage detectors like YOLOv8 offer fast training speed, they often under-perform on detecting small objects as a trade-off. This becomes even more critical when detecting tiny objects in aerial imagery due to low-resolution targets and cluttered backgrounds. To address this, we introduce four enhancement strategies-input image resolution adjustment, data augmentation, attention mechanisms, and an alternative gating function for...
AttnRegDeepLab: A Two-Stage Decoupled Framework for Interpretable Embryo Fragmentation Grading
arXiv:2511.18454v4 Announce Type: replace Abstract: Assessing embryo fragmentation is crucial for predicting IVF success, yet manual grading is prone to subjectivity, and existing AI models struggle with clinical interpretability and segmentation errors. We propose AttnRegDeepLab, a Multi-Task Learning (MTL) framework designed to solve these challenges. The model enhances a DeepLabV3+ decoder with Attention Gates to filter out cytoplasmic noise and retain sharp contour details.
CART: Context-Anchored Recurrent Transformer -- A Parameter-Efficient Architecture with Learned Stability
arXiv:2606.01495v2 Announce Type: replace Abstract: We present CART (Context-Anchored Recurrent Transformer), a parameter-efficient language model that reuses a single shared core block R times across depth. Unlike prior looped transformers that recompute key-value tensors at every iteration, CART computes K and V once from a multi-layer prelude and has the recurrent core cross-attend to those frozen tensors via multi-head latent attention. A learned Linear Time-Invariant (LTI) gate keeps...
CART: Context-Anchored Recurrent Transformer -- A Parameter-Efficient Architecture with Learned Stability
new Abstract: We present CART (Context-Anchored Recurrent Transformer), a parameter-efficient language model that reuses a single shared core block R times across depth. Unlike prior looped transformers that recompute key-value tensors at every iteration, CART computes K and V once from a multi-layer prelude and has the recurrent core cross-attend to those frozen tensors via multi-head latent attention. A learned Linear Time-Invariant (LTI) gate keeps the recurrence stable: its spectral...