Home › Knowledge Base › Multimodal Fusion

Multimodal Fusion

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

Geometry-based Schr\"odinger Bridges for Trustworthy Multimodal Fusion

Announce Type: new Abstract: Real-world multimodal systems must be robust against low-quality data, such as sensor noise, incomplete multimodal data and conflicting inputs. However, existing trustworthy fusion methods rely on the model's own prediction confidence to judge data quality. This creates a circular dependency: when a model is confident but wrong, these methods fail to detect the error.

arXiv CS 9d ago

UrbanFusion: Stochastic Multimodal Fusion for Contrastive Learning of Robust Spatial Representations

arXiv:2510.13774v2 Announce Type: replace Abstract: Forecasting urban phenomena such as housing prices and public health indicators requires the effective integration of various geospatial data. Current methods primarily utilize task-specific models, while recent generic models for spatial representations often support only limited modalities and lack multimodal fusion capabilities. To overcome these challenges, we present UrbanFusion, a spatial representation model that features Stochastic...

arXiv CS 8d ago

Reasoning-Aware Multimodal Fusion for Hateful Video Detection

arXiv:2512.02743v2 Announce Type: replace Abstract: Hate speech in online videos is posing an increasingly serious threat to digital platforms, especially as video content becomes increasingly multimodal and context-dependent. Existing methods often struggle to effectively fuse the complex semantic relationships between modalities and lack the ability to understand nuanced hateful content. To address these issues, we propose an innovative Reasoning-Aware Multimodal Fusion (RAMF) framework.

arXiv CS 9d ago

CL-DMDF:Dynamic Multimodal Data Fusion Model Based on Contrastive Learning

arXiv:2606.02659v1 Announce Type: new Abstract: Multimodal data fusion involves integrating and analyzing information from multiple modalities to uncover latent correlations and complementary patterns, thereby enhancing data processing and decision-making. While existing methods for structured multimodal inputs are typically designed around specific tasks and assume fully observed modalities, real-world applications often suffer from uncertain or missing modality inputs due to various...

arXiv CS 7d ago

CAMF-Det: Closure-Aware Multimodal Fusion for LiDAR-Camera 3D Object Detection on UAV Platforms

arXiv:2606.09143v1 Announce Type: new Abstract: Multimodal 3D object detection based on LiDAR and cameras has demonstrated excellent performance in ground-vehicle scenarios, but has not been explored for Unmanned Aerial Vehicle (UAV) platforms. In UAV top-down scenes, frequent groundobject occlusion dominated by tree canopies causes spatially varying and modality-dependent information degradation.

arXiv CS 1d ago

COMPASS: Complete Multimodal Fusion via Proxy Tokens and Shared Spaces for Ubiquitous Sensing

arXiv:2604.02056v2 Announce Type: replace Abstract: Missing modalities in multimodal sensing cause not only information loss but also a fusion-interface mismatch: a fusion head trained on a canonical set of modality slots must operate on changing observed subsets at inference time. We propose Compass, an interface-complete fusion framework that restores this canonical slot structure before prediction. Each modality is assigned a fixed fusion slot.

arXiv CS 1d ago

PAMF: Prior-Aware Multimodal Fusion for Incomplete Time Series Data

Announce Type: new Abstract: In healthcare, multimodal time series tasks often operate on incomplete observations in practice, for example when ECG segments are lost because electrodes detach or an entire respiratory channel is unavailable during overnight monitoring. Such missingness typically appears in two structurally distinct patterns: within-modality missing, where values are absent within an otherwise observed modality, and modality-level missing, where an entire modality is unavailable.

arXiv CS 5d ago

Multimodal Fusion via Self-Consistent Task-Gradient Fields

arXiv:2410.15475v2 Announce Type: replace Abstract: Multimodal learning aims to preserve as much task-related information as possible from different inputs. However, current fusion designs often distort the feedback loop to feature extractors. Aggressively merging modalities entangles their representations, making the feature extractors fragile to incomplete inputs.

arXiv CS 9d ago

ExpSpeech-Net: Multimodal Fusion of Expression and Speech for Deepfake Detection

Announce Type: new Abstract: Deepfake videos are increasingly challenging the credibility of online content. Many existing detection methodology relies on complex, resource-intensive models, which limit their practical use. The study introduces the ExpSpeech-Net deepfake detection (SqN-R-DFD) model, which utilizes SqueezeNet and RNN (Recurrent Neural Network) as its backbone, providing a lightweight and efficient deepfake detection framework that simultaneously analyzes facial expressions...

arXiv CS 5d ago

Step-adaptive multimodal fusion network with multi-scale cloud feature learning for ultra-short-term solar irradiance forecasting

arXiv:2606.06102v1 Announce Type: new Abstract: Ultra-short-term solar irradiance prediction is critical for photovoltaic system dispatch and power grid stability. Existing approaches suffer from three key shortcomings: single time-series models cannot capture the spatial dynamics of clouds under complex conditions, standard convolutions inadequately represent multi-scale cloud features, and fixed low-frequency compensation strategies fail to adapt to different prediction steps. To address...

arXiv CS 5d ago