Home › Knowledge Base › Dual-Stream

Dual-Stream

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

Dual-Stream MLP is All You Need for CTR Prediction

arXiv:2606.04944v1 Announce Type: new Abstract: Click-through rate (CTR) prediction holds a pivotal role in online advertising and recommendation systems, where even small improvements can significantly boost revenue. Existing research primarily focuses on designing dual-stream architectures to capture effective complex feature interactions from both explicit and implicit perspectives. However, these approaches are faced with two major challenges: 1) the high complexity of feature...

arXiv CS 6d ago

HybridCodec: Fast Dual-Stream, Semantically Enhanced Neural Audio Codec

arXiv:2606.06743v1 Announce Type: new Abstract: The popularity of neural audio codecs as speech tokenizers has surged with the advent of Multimodal Large Language Models. New codec architectures with semantic and acoustic disentanglement have emerged. There are two main approaches to introduce semantic information into codec models: one distills semantic information from SSL representations into the first RVQ layer, while the other maintains separate streams for semantic and acoustic features.

arXiv CS 2d ago

DuoGesture: Neuro-Inspired and Biomechanically Informed Dual-Stream Co-Speech Gesture Generation

arXiv:2605.26236v2 Announce Type: replace Abstract: Co-speech gesture generation requires both semantic expressivity and biomechanically plausible rhythmic motion. Existing holistic gesture models mix lexically grounded semantic gestures with frequent prosody-aligned beat gestures. This limits semantic grounding, speech-motion alignment, and kinematic smoothness.

arXiv CS 5d ago

Lethe: Adapter-Augmented Dual-Stream Update for Persistent Knowledge Erasure in Federated Unlearning

Announce Type: replace Abstract: Federated unlearning (FU) aims to erase designated client-level, class-level, or sample-level knowledge from a global model. Existing studies commonly assume that the collaboration ends with the unlearning operation, overlooking the follow-up situation where federated training continues over the remaining data. We identify a critical failure mode, termed knowledge resurfacing, by revealing that continued training can re-activate unlearned knowledge and cause...

arXiv CS 7d ago

\textsc{Lethe}: Principled Dual-Stream Update for Persistent Knowledge Erasure in Federated Unlearning

arXiv:2601.22601v3 Announce Type: replace Abstract: Federated unlearning (FU) aims to erase knowledge from a global model. Existing studies commonly assume that federated collaboration terminates after unlearning, overlooking a deployment-realistic scenario where training continues on the remaining clients after deletion requests are fulfilled. In this work, we identify a critical failure mode, termed knowledge resurfacing, revealing that continued training on retained data alone can...

arXiv CS 6d ago

GlucoFM: A Dual-Stream Foundation Model for Continuous Glucose Monitoring

arXiv:2605.30865v1 Announce Type: new Abstract: Continuous glucose monitoring (CGM) provides a dense view of daily metabolic physiology, yet existing generic time-series and CGM-specific foundation models often encode glucose traces as entangled single-stream sequences, leaving the distinct temporal structure of glycemic dynamics only implicitly modeled. We present GlucoFM, a lightweight CGM foundation model that aligns irregular recordings to a 24-hour chronological grid, preserves...

arXiv CS 9d ago

PRISM: Synergizing Vision Foundation Models via Self-organized Expert Specialization

Announce Type: new Abstract: Unifying the complementary strengths of diverse Vision Foundation Models (VFMs) into a single efficient model is highly desirable but challenged by the negative transfer inherent in monolithic distillation. To address these feature conflicts, we introduce \textbf{PRISM}, a novel dual-stream Mixture-of-Experts (MoE) framework that synergizes VFMs via modular specialization.

arXiv CS 7d ago

Policy-based Foveated Imaging and Perception

Announce Type: new Abstract: Ultra-high-resolution image sensors offer the potential to capture fine spatial details critical for many visual perception tasks, but acquiring and processing all pixels at full resolution is often infeasible under realistic bandwidth, latency, and power constraints. Existing approaches address this challenge through acquisition strategies such as spatial or temporal downsampling, which irrevocably discard information before task relevance can be assessed. In...

arXiv CS 8d ago

dots.tts Technical Report

arXiv:2606.07080v1 Announce Type: new Abstract: We present dots.tts, a 2B-parameter continuous autoregressive text-to-speech (TTS) foundation model that models speech in a continuous latent space. Compared with existing continuous autoregressive models, our key innovations are threefold. First, we train an AudioVAE with multiple objectives to build a semantically structured and prediction-friendly continuous speech space.

arXiv CS 2d ago

iVGR: Internalizing Visually Grounded Reasoning for MLLMs with Reinforcement Learning

arXiv:2605.31096v1 Announce Type: new Abstract: While visually grounded Chain-of-Thought (CoT) has emerged as a promising paradigm to enhance fine-grained perception in multimodal large language models (MLLMs), its efficacy during the inference phase remains underexplored. In this work, we empirically find that mandating explicit object boxes in visually grounded CoT during inference often degrades performance compared to standard textual CoT, which reasons without explicit visual grounding....

arXiv CS 9d ago