Home Knowledge Base Adaptive Multi-Modal Reasoning

Adaptive Multi-Modal Reasoning

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

Dreaming when Necessary: Advancing World Action Models with Adaptive Multi-Modal Reasoning

arXiv:2606.07089v1 Announce Type: new Abstract: World Action Models (WAMs) offer a promising approach to embodied intelligence, yet existing methods rely heavily on video prediction as action priors and lack adaptive multimodal reasoning, limiting their effectiveness on long-horizon, complex tasks. We observe that WAMs require different multimodal reasoning modes under different execution contexts: textual reasoning is essential during task transitions to guide high-level action prediction,...

arXiv CS 2d ago

Robust-LLaVA: On the Effectiveness of Large-Scale Robust Image Encoders for Multi-modal Large Language Models

arXiv:2502.01576v2 Announce Type: replace Abstract: Multi-modal Large Language Models (MLLMs) excel in vision-language tasks but remain vulnerable to visual adversarial perturbations that can induce hallucinations, manipulate responses, or bypass safety mechanisms. Existing methods seek to mitigate these risks by applying constrained adversarial fine-tuning to CLIP vision encoders on ImageNet-scale data, ensuring their generalization ability is preserved. However, this limited adversarial...

arXiv CS 6d ago

Exploring Adversarial Robustness and Safety Alignment in Multilingual Multi-Modal Large Language Models

arXiv:2606.03793v1 Announce Type: new Abstract: Multimodal Large Language Models integrate visual perception into language reasoning, introducing a continuous attack surface susceptible to adversarial attacks. Prior work on MLLM robustness has focused largely on English-centric tasks, leaving multilingual behaviour unexplored. We address this gap through a systematic study of adversarial robustness and multimodal safety across 12 diverse languages, evaluating open-source MLLMs that acquire...

arXiv CS 7d ago

CLEAR: Cognition and Latent Evaluation for Adaptive Routing in End-to-End Autonomous Driving

arXiv:2606.06219v1 Announce Type: new Abstract: End-to-end autonomous driving models often struggle to balance multi-modal maneuver generation with real-time inference constraints. While diffusion models successfully capture diverse driving behaviors, their iterative denoising process incurs unacceptable latency for safety-critical deployment. To address this, we propose CLEAR (Cognition and Latent Evaluation for Adaptive Routing), a framework that combines ultra-fast generative planning...

arXiv CS 5d ago

Human-Like Neural Nets by Catapulting

Human-like Neural Nets by Catapulting Speculative proposal to create artificial neural nets with human-like performance by high-learning-rate/regularization training of overparameterized NNs to trigger catapulting/grokking. Over-parameterization as a route to true generalization would resolve many outstanding mysteries of artificial versus natural intelligence. There are many mysteries about deep learning and human intelligence, but we could describe the biggest anomaly this way: why are...

Hacker News 3d ago

Tessera AI model offers accessible way to view Earth

Tessera AI model offers accessible way to view Earth Lisa Lock Scientific Editor Andrew Zinin Lead Editor A foundation model trained on Earth observation data from Copernicus Sentinel-1 and Sentinel-2 has been made widely available to researchers, it was announced at a computer industry conference this week in Denver, U.S. Tessera, an advanced artificial intelligence (AI) model, offers high-accuracy datasets that encode what the satellite "sees" of Earth's surface during the course of a...

Phys.org 5d ago