Home › Knowledge Base › Modality Dynamics

Modality Dynamics

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

DyCo-RL: Dynamic Cross-Modal Coordination for Visual Reasoning

Announce Type: new Abstract: Reinforcement Learning with Verifiable Rewards (RLVR) has emerged as a leading paradigm for enhancing visual reasoning in Multimodal Large Language Models (MLLMs). However, existing RLVR methods optimize primarily for the reasoning outcome, fundamentally overlooking the fine-grained cross-modal coordination required during the generation process. Through token-level analyses and controlled interventions, we reveal that during Chain-of-Thought (CoT) reasoning,...

arXiv CS 1d ago

TokaMind: A Multi-Modal Transformer Foundation Model for Tokamak Plasma Dynamics

Announce Type: replace Abstract: We present TokaMind, to our knowledge the first open-source foundation model for tokamak plasma dynamics, based on a Multi-Modal Transformer (MMT) and pretrained on heterogeneous diagnostics from the publicly available MAST dataset. TokaMind supports multiple data modalities (time-series, 2D profiles, and videos) with different sampling rates, robust missing-signal handling, and efficient task adaptation via selectively loading and freezing four model...

arXiv Physics 2d ago

TokaMind: A Multi-Modal Transformer Foundation Model for Tokamak Plasma Dynamics

Announce Type: replace-cross Abstract: We present TokaMind, to our knowledge the first open-source foundation model for tokamak plasma dynamics, based on a Multi-Modal Transformer (MMT) and pretrained on heterogeneous diagnostics from the publicly available MAST dataset. TokaMind supports multiple data modalities (time-series, 2D profiles, and videos) with different sampling rates, robust missing-signal handling, and efficient task adaptation via selectively loading and freezing four model...

arXiv CS 2d ago

Finite-inertia effects in Langevin dynamics of a lopsided elastic dumbbell using exponential-time differencing schemes

arXiv:2605.31078v1 Announce Type: cross Abstract: Inertia effects in the Langevin dynamics of a lopsided elastic dumbbell are investigated using exponential-time-differencing (ETD) integrators for the corresponding stiff stochastic equations at small mass limit. Starting from the bead-level underdamped Langevin model, we formulate the dynamics in modal coordinates, highlighting two distinct friction scales: an additive friction $\zeta_{\rm trans}=\zeta_1+\zeta_2$ controlling translation...

arXiv Physics 9d ago

Dynamic Interaction-Aware and Causality-Disentangled Framework for Multimodal Sentiment Analysis

arXiv:2605.30994v1 Announce Type: new Abstract: Although Multimodal Sentiment Analysis (MSA) effectively leverages rich information from language, visual, and acoustic modalities, existing methods still face two core challenges: 1) static conflict suppression mechanisms fail to adapt to dynamic variations across samples, and 2) the inherent sentimental bias within the language modality, which can misguide learning from other modalities, remains entangled. To this end, we propose a Dynamic...

arXiv CS 9d ago

Conan-embedding-v3: Fusing Modality-Specific Models for Omni-Modal Embedding

arXiv:2606.09331v1 Announce Type: new Abstract: Omni-modal retrieval promises a single embedding space for text, image, video, document, and audio inputs, but building such a unified retriever is difficult since these modalities differ in data distribution, architecture, and optimization dynamics. In this work, we present Conan-embedding-v3, a decouple--fuse--recover framework for omni-modal retrieval. Conan-embedding-v3 first trains modality specialists independently and fuses their task...

arXiv CS 1d ago

DeepIPCv3: Event-Aware Multi-Modal Sensor Fusion for Sudden Pedestrian Crossing Avoidance

Announce Type: new Abstract: Current end-to-end autonomous driving systems predominantly rely on frame-based sensors, which suffer from inherent perception latency and motion blur during highly dynamic encounters, specifically sudden pedestrian crossings. To address this critical safety vulnerability, we propose DeepIPCv3, a novel multi-modal autonomous navigation framework that synergizes the dense 3D spatial geometry of LiDAR point clouds with the microsecond-level asynchronous event...

arXiv CS 8d ago

MACS: Modality-Aware Capacity Scaling for Efficient Multimodal MoE Inference

arXiv:2605.05225v3 Announce Type: replace Abstract: Mixture-of-Experts Multimodal Large Language Models (MoE MLLMs) suffer from a significant efficiency bottleneck during Expert Parallelism (EP) inference due to the straggler effect. This issue is worsened in the multimodal context, as existing token-count-based load balancing methods fail to address two unique challenges: (1) Information Heterogeneity, where numerous redundant visual tokens are treated equally to semantically critical ones,...

arXiv CS 2d ago

FAWAM: Force-Aware World Action Models for Closed-Loop Contact-Rich Manipulation

Announce Type: new Abstract: Force signals provide critical interaction cues for contact-rich robotic manipulation. However, existing methods mostly use force as an additional observation modality, without fully exploiting its role in modeling future interaction dynamics or guiding execution-time feedback correction.

arXiv CS 1d ago

JI-ADF: Joint-Individual Learning with Adaptive Decision Fusion for Multimodal Skin Lesion Classification

Announce Type: replace Abstract: Skin lesion classification is essential for early dermatological diagnosis, yet many existing computer-aided systems rely primarily on dermoscopic images and underutilize the multimodal evidence routinely available in clinical practice. To address this gap, we propose \textbf{JI-ADF}, a trimodal deep learning framework that integrates dermoscopic images, clinical photographs, and structured patient metadata for clinically grounded skin lesion classification....

arXiv CS 5d ago