Home Knowledge Base MMSI

MMSI

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

Thinking with Imagination: Agentic Visual Spatial Reasoning with World Simulators

Announce Type: new Abstract: While Vision-Language Models (VLMs) have shown strong visual reasoning capabilities, their spatial reasoning abilities remain largely constrained to the observed images and text-oriented chain-of-thought. They often struggle to infer unobserved layouts, maintain cross-view consistency, and reason from alternative viewpoints when only limited egocentric observations are available. In this work, we study this problem as thinking with imagination, where a VLM...

arXiv CS 5d ago

Skill-3D: Evolving Scene-Aware Skills for Agentic 3D Spatial Reasoning

arXiv:2606.07436v1 Announce Type: new Abstract: This paper explores agentic 3D spatial understanding, i.e., MLLM agents performing 3D reasoning through tool use. Existing methods often misuse tools and exhibit biased tool preferences under 3D scenarios, leaving the agentic paradigm with only marginal gains over non-agentic strategies. We reveal that 3D spatial reasoning tasks are heterogeneous across scenes, while these agents apply a uniform tool-use strategy to all scenes rather than...

arXiv CS 2d ago

When and How Much to Imagine: Adaptive Test-Time Scaling with World Models for Visual Spatial Reasoning

arXiv:2602.08236v2 Announce Type: replace Abstract: Despite rapid progress in MLLMs, visual spatial reasoning remains unreliable when correct answers depend on how a scene would appear under unseen or alternative viewpoints. Recent work addresses this by augmenting reasoning with world models for visual imagination, but questions such as when imagination is actually necessary, how much of it is beneficial, and when it becomes harmful, remain poorly understood.

arXiv CS 8d ago