Modality Dynamics
No mentions found
This entity hasn't been tracked yet, or Iris is still building its knowledge base.
Related Articles from SNS
DyCo-RL: Dynamic Cross-Modal Coordination for Visual Reasoning
Announce Type: new Abstract: Reinforcement Learning with Verifiable Rewards (RLVR) has emerged as a leading paradigm for enhancing visual reasoning in Multimodal Large Language Models (MLLMs). However, existing RLVR methods optimize primarily for the reasoning outcome, fundamentally overlooking the fine-grained cross-modal coordination required during the generation process. Through token-level analyses and controlled interventions, we reveal that during Chain-of-Thought (CoT) reasoning,...
TokaMind: A Multi-Modal Transformer Foundation Model for Tokamak Plasma Dynamics
Announce Type: replace Abstract: We present TokaMind, to our knowledge the first open-source foundation model for tokamak plasma dynamics, based on a Multi-Modal Transformer (MMT) and pretrained on heterogeneous diagnostics from the publicly available MAST dataset. TokaMind supports multiple data modalities (time-series, 2D profiles, and videos) with different sampling rates, robust missing-signal handling, and efficient task adaptation via selectively loading and freezing four model...
TokaMind: A Multi-Modal Transformer Foundation Model for Tokamak Plasma Dynamics
Announce Type: replace-cross Abstract: We present TokaMind, to our knowledge the first open-source foundation model for tokamak plasma dynamics, based on a Multi-Modal Transformer (MMT) and pretrained on heterogeneous diagnostics from the publicly available MAST dataset. TokaMind supports multiple data modalities (time-series, 2D profiles, and videos) with different sampling rates, robust missing-signal handling, and efficient task adaptation via selectively loading and freezing four model...
Finite-inertia effects in Langevin dynamics of a lopsided elastic dumbbell using exponential-time differencing schemes
arXiv:2605.31078v1 Announce Type: cross Abstract: Inertia effects in the Langevin dynamics of a lopsided elastic dumbbell are investigated using exponential-time-differencing (ETD) integrators for the corresponding stiff stochastic equations at small mass limit. Starting from the bead-level underdamped Langevin model, we formulate the dynamics in modal coordinates, highlighting two distinct friction scales: an additive friction $\zeta_{\rm trans}=\zeta_1+\zeta_2$ controlling translation...
Dynamic Interaction-Aware and Causality-Disentangled Framework for Multimodal Sentiment Analysis
arXiv:2605.30994v1 Announce Type: new Abstract: Although Multimodal Sentiment Analysis (MSA) effectively leverages rich information from language, visual, and acoustic modalities, existing methods still face two core challenges: 1) static conflict suppression mechanisms fail to adapt to dynamic variations across samples, and 2) the inherent sentimental bias within the language modality, which can misguide learning from other modalities, remains entangled. To this end, we propose a Dynamic...
Conan-embedding-v3: Fusing Modality-Specific Models for Omni-Modal Embedding
arXiv:2606.09331v1 Announce Type: new Abstract: Omni-modal retrieval promises a single embedding space for text, image, video, document, and audio inputs, but building such a unified retriever is difficult since these modalities differ in data distribution, architecture, and optimization dynamics. In this work, we present Conan-embedding-v3, a decouple--fuse--recover framework for omni-modal retrieval. Conan-embedding-v3 first trains modality specialists independently and fuses their task...
DeepIPCv3: Event-Aware Multi-Modal Sensor Fusion for Sudden Pedestrian Crossing Avoidance
Announce Type: new Abstract: Current end-to-end autonomous driving systems predominantly rely on frame-based sensors, which suffer from inherent perception latency and motion blur during highly dynamic encounters, specifically sudden pedestrian crossings. To address this critical safety vulnerability, we propose DeepIPCv3, a novel multi-modal autonomous navigation framework that synergizes the dense 3D spatial geometry of LiDAR point clouds with the microsecond-level asynchronous event...
MACS: Modality-Aware Capacity Scaling for Efficient Multimodal MoE Inference
arXiv:2605.05225v3 Announce Type: replace Abstract: Mixture-of-Experts Multimodal Large Language Models (MoE MLLMs) suffer from a significant efficiency bottleneck during Expert Parallelism (EP) inference due to the straggler effect. This issue is worsened in the multimodal context, as existing token-count-based load balancing methods fail to address two unique challenges: (1) Information Heterogeneity, where numerous redundant visual tokens are treated equally to semantically critical ones,...
FAWAM: Force-Aware World Action Models for Closed-Loop Contact-Rich Manipulation
Announce Type: new Abstract: Force signals provide critical interaction cues for contact-rich robotic manipulation. However, existing methods mostly use force as an additional observation modality, without fully exploiting its role in modeling future interaction dynamics or guiding execution-time feedback correction.
JI-ADF: Joint-Individual Learning with Adaptive Decision Fusion for Multimodal Skin Lesion Classification
Announce Type: replace Abstract: Skin lesion classification is essential for early dermatological diagnosis, yet many existing computer-aided systems rely primarily on dermoscopic images and underutilize the multimodal evidence routinely available in clinical practice. To address this gap, we propose \textbf{JI-ADF}, a trimodal deep learning framework that integrates dermoscopic images, clinical photographs, and structured patient metadata for clinically grounded skin lesion classification....