Home Knowledge Base Attention Gates

Attention Gates

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

Frequency-Domain Latent Attention Gating for Cross-Domain Token Aggregation

arXiv:2606.08191v1 Announce Type: new Abstract: Token aggregation is a common bottleneck in models that map token representations to sample-level predictions, yet most pooling methods operate only in the original token domain. We propose FLaG, a plug-in aggregation module that transforms token representations with the real FFT, summarizes spectral components with learnable latent queries, applies a channel-wise gate, and reconstructs enhanced time-domain tokens for final pooling.

arXiv CS 1d ago

Gated Bidirectional Linear Attention for Generative Retrieval

arXiv:2606.07317v2 Announce Type: replace Abstract: In recommender systems, generative retrieval typically uses an encoder-decoder setup: an encoder processes a user interaction history, and an autoregressive decoder then generates recommended items. In large-scale streaming services, active users accumulate very long histories over time.

arXiv CS 1d ago

Gated Bidirectional Linear Attention for Generative Retrieval

arXiv:2606.07317v1 Announce Type: new Abstract: In recommender systems, generative retrieval typically uses an encoder-decoder setup: an encoder processes a user interaction history, and an autoregressive decoder then generates recommended items. In large-scale streaming services, active users accumulate very long histories over time.

arXiv CS 2d ago

A Method for Neutron-Gamma Pulse Shape Discrimination of CLYC Detector Based on a Gated Residual-Linear Attention Network

arXiv:2606.02613v1 Announce Type: new Abstract: The discrimination of neutron and gamma pulse shapes is a key technology in fields such as nuclear safety monitoring and radiation assessment. An enhanced recursive gated cyclic residual-sparse linear attention network is developed on the CLYC detector experimental platform to overcome weak noise resistance, limited feature extraction and inferior real-time performance of conventional algorithms. The experimental dataset comprises 19,971...

arXiv Physics 7d ago

Learning to Remember, Learn, and Forget in Attention-Based Models

Announce Type: replace Abstract: In-Context Learning (ICL) in transformers acts as an online associative memory and is believed to underpin their high performance on complex sequence processing tasks. However, in gated linear attention models, this memory has a fixed capacity and is prone to interference, especially for long sequences. We propose Palimpsa, a self-attention model that views ICL as a continual learning problem that must address a stability-plasticity dilemma.

arXiv CS 6d ago

Learning to Remember, Learn, and Forget in Attention-Based Models

Announce Type: replace Abstract: In-Context Learning (ICL) in transformers acts as an online associative memory and is believed to underpin their high performance on complex sequence processing tasks. However, in gated linear attention models, this memory has a fixed capacity and is prone to interference, especially for long sequences. We propose Palimpsa, a self-attention model that views ICL as a continual learning problem that must address a stability-plasticity dilemma.

arXiv CS 8d ago

Enhanced Detection of Tiny Objects in Aerial Images

arXiv:2509.17078v3 Announce Type: replace Abstract: While one-stage detectors like YOLOv8 offer fast training speed, they often under-perform on detecting small objects as a trade-off. This becomes even more critical when detecting tiny objects in aerial imagery due to low-resolution targets and cluttered backgrounds. To address this, we introduce four enhancement strategies-input image resolution adjustment, data augmentation, attention mechanisms, and an alternative gating function for...

arXiv CS 1d ago

AttnRegDeepLab: A Two-Stage Decoupled Framework for Interpretable Embryo Fragmentation Grading

arXiv:2511.18454v4 Announce Type: replace Abstract: Assessing embryo fragmentation is crucial for predicting IVF success, yet manual grading is prone to subjectivity, and existing AI models struggle with clinical interpretability and segmentation errors. We propose AttnRegDeepLab, a Multi-Task Learning (MTL) framework designed to solve these challenges. The model enhances a DeepLabV3+ decoder with Attention Gates to filter out cytoplasmic noise and retain sharp contour details.

arXiv CS 1d ago

CART: Context-Anchored Recurrent Transformer -- A Parameter-Efficient Architecture with Learned Stability

arXiv:2606.01495v2 Announce Type: replace Abstract: We present CART (Context-Anchored Recurrent Transformer), a parameter-efficient language model that reuses a single shared core block R times across depth. Unlike prior looped transformers that recompute key-value tensors at every iteration, CART computes K and V once from a multi-layer prelude and has the recurrent core cross-attend to those frozen tensors via multi-head latent attention. A learned Linear Time-Invariant (LTI) gate keeps...

arXiv CS 6d ago

CART: Context-Anchored Recurrent Transformer -- A Parameter-Efficient Architecture with Learned Stability

new Abstract: We present CART (Context-Anchored Recurrent Transformer), a parameter-efficient language model that reuses a single shared core block R times across depth. Unlike prior looped transformers that recompute key-value tensors at every iteration, CART computes K and V once from a multi-layer prelude and has the recurrent core cross-attend to those frozen tensors via multi-head latent attention. A learned Linear Time-Invariant (LTI) gate keeps the recurrence stable: its spectral...

arXiv CS 8d ago