Home Knowledge Base Information Transformation Block

Information Transformation Block

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

Expert-Aware Causal Tracing of Factual Recall in Sparse MoE Language Models

new Abstract: Causal tracing of factual recall has been studied predominantly in dense transformer language models, where interventions localize information flow to layers or feed-forward modules. Sparse mixture-of-experts (MoE) language models introduce a sharper question: when a factual prediction is mediated by a routed MoE block, which routed expert contributions matter? We formulate expert-aware causal tracing for sparse MoE language models.

arXiv CS 7d ago

DBHN-Net: Dual-Branch Hybrid Neural Network For Low-Complexity Monaural Speech Enhancement

arXiv:2606.05911v1 Announce Type: new Abstract: Although artificial neural network (ANN) based speech enhancement (SE) methods demonstrate excellent performance, the high computational complexity and high energy consumption hinder their deployment in practical front-end processing tasks.} Currently, the spiking neural networks (SNNs) have shown potential in reducing power consumption. However, the discrete binary activation and complex spatio-temporal dynamics of SNNs often result in...

arXiv CS 5d ago

QVGGT: Post-Training Quantized Visual Geometry Grounded Transformer

Announce Type: new Abstract: Estimating 3D attributes directly from images has advanced rapidly with the Visual Geometry Grounded Transformer (VGGT), which predicts camera parameters, depth maps, and point clouds in a single forward pass. However, its 1.2B-parameter scale severely limits deployment on resource-constrained platforms such as UAVs and mobile AR devices. To address this limitation, we introduce QVGGT, a tailored quantization framework designed to compress VGGT.

arXiv CS 9d ago

GPT-2: Too Dangerous To Release (2019)

GPT-2: Too Dangerous To Release (2019) The Difference between GPT-1 and GPT-2 GPT-2 is a direct scale-up of GPT-1, with more parameters and trained on more data. However, it was deemed too dangerous to release by OpenAI: Due to our concerns about malicious applications of the technology, we are not releasing the trained model.

Hacker News 1d ago

MaskAlign: Token-Subset Representation Alignment for Efficient Diffusion Training

arXiv:2606.08788v1 Announce Type: new Abstract: Representation alignment with pretrained vision models has recently shown strong potential for accelerating diffusion transformer training. By aligning intermediate diffusion features with clean-image representations from self-supervised vision encoders, existing methods improve convergence and generation quality. However, such alignment also introduces a non-trivial constraint: diffusion models operate on noisy inputs whose usable information...

arXiv CS 1d ago

How Much of a Model Do We Need? Redundancy and Slimmability in Remote Sensing Foundation Models

arXiv:2601.22841v2 Announce Type: replace Abstract: Large-scale foundation models (FMs) in remote sensing (RS) (denoted as RS FMs) are developed following paradigms established in computer vision (CV), yet the validity of transferring CV scaling laws to RS has not been systematically examined. We hypothesize that RS FMs enter an overparameterized regime at substantially smaller scales than their CV counterparts, with task-relevant information encoded redundantly across model dimensions. To...

arXiv CS 7d ago

Signed Dual Attention: Capturing Signed Dependencies in Time Series Forecasting

arXiv:2606.04833v1 Announce Type: new Abstract: Initially developed for natural language processing, Transformer architectures and attention mechanisms are now central to a wide range of deep learning models, including applications in time series forecasting. A standard attention mechanism, however, implicitly assumes homophilic interactions, limiting its ability to model data with positive and negative dependencies, such as time series. In this work, we introduce the Signed Dual Attention,...

arXiv CS 6d ago

Microsoft's new quantum chip is 1,000 times more reliable than its predecessor — but why is this new chip so controversial?

Microsoft's new quantum chip is 1,000 times more reliable than its predecessor — but why is this new chip so controversial? The Majorana 2 quantum processor is built from topological qubits, and its creators claim it can sustain quantum coherence for an average of 20 seconds — orders of magnitude longer than the milliseconds that conventional chips last. Microsoft has revealed a new quantum computing chip with quantum bits (qubits) it says are capable of maintaining their quantum state for...

Live Science 6d ago

Forget Attention: Importance-Aware Attention Is All You Need

Announce Type: new Abstract: Combining attention's global retrieval with the sequential importance signal of state space models (SSMs) is the open challenge of hybrid language modeling. Transformers see everywhere but cannot prioritize; SSMs know what matters but cannot revisit. Existing hybrids -- Jamba (block level) and Hymba (head level) -- place the two in separate compartments, so neither informs the other during the attention computation itself.

arXiv CS 8d ago

On-the-fly Repulsion in the Contextual Space for Rich Diversity in Diffusion Transformers

Announce Type: replace Abstract: Modern Text-to-Image (T2I) diffusion models have achieved remarkable semantic alignment, yet they often suffer from a significant lack of variety, converging on a narrow set of visual solutions for any given prompt. This typicality bias presents a challenge for creative applications that require a wide range of generative outcomes. We identify a fundamental trade-off in current approaches to diversity: modifying model inputs requires costly optimization to...

arXiv CS 6d ago