Home › Knowledge Base › Static Spatial

Static Spatial

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

SpaMEM: Benchmarking Dynamic Spatial Reasoning via Perception-Memory Integration in Embodied Environments

arXiv:2604.22409v3 Announce Type: replace Abstract: Multimodal large language models (MLLMs) have advanced static visual--spatial reasoning, yet they often fail to preserve long-horizon spatial coherence in embodied settings where beliefs must be continuously revised from egocentric observations under environmental change. We introduce SpaMEM (Spatial Memory from Action Sequences), a large-scale diagnostic benchmark that isolates the mechanics of spatial belief evolution via...

arXiv CS 9d ago

Static and Dynamic Representations for Tactile Contact-Angle Estimation with Event-Based Sensors

Announce Type: new Abstract: Event-based tactile sensing offers low-latency signal acquisition for contact-rich robotic interaction. This paper investigates contact-angle estimation using event streams from an event-based tactile sensor (NeuroTac) and compares three event-derived spatial contour representations: a dynamic representation capturing recent event activity, a static representation recovering a more persistent contact state, and their combined representation. Across the evaluated...

arXiv CS 7d ago

SpatialWorld: Benchmarking Interactive Spatial Reasoning of Multimodal Agents in Real-World Tasks

Announce Type: new Abstract: Spatial reasoning is a foundational capability for multimodal large language models (MLLMs) to perceive and operate within the physical world. However, existing benchmarks predominantly rely on passive evaluation (e.g., static VQA) or simulator-specific pipelines, failing to assess general interactive spatial understanding. We introduce SpatialWorld, a unified benchmark designed specifically for evaluating the interactive spatial understanding of multimodal...

arXiv CS 1d ago

IMUG-Bench: Benchmarking Unified Multimodal Models on Interleaved Understanding and Generation

arXiv:2606.09169v1 Announce Type: new Abstract: In recent years, unified multimodal models (UMMs) have emerged to support both understanding and generation within a single framework. Mastering dynamic, multi-turn interleaved image-text dialogues is a crucial task for UMMs in real-world applications. However, existing benchmarks fail to evaluate this important task, as they are often limited to single-turn or static settings, and typically overlook exposure bias in multi-turn interactions.

arXiv CS 1d ago

Trustworthy Visual Predicates for Robust Manipulation Understanding under Degradation

arXiv:2606.08121v1 Announce Type: new Abstract: Manipulation understanding requires reliable relational evidence, such as contact, support, containment, motion coupling, grasp, release, and active-hand involvement. Although these visual predicates are widely used in event-chain, graph-based, and neuro-symbolic models, their reliability under visual degradation is rarely analyzed directly. This paper introduces a predicate-level reliability framework for robust manipulation understanding...

arXiv CS 1d ago

Agent Skills Should Go Beyond Text: The Case for Visual Skills

arXiv:2606.01414v1 Announce Type: new Abstract: Reusable skills are a key mechanism for extending agent capabilities, allowing agents to accumulate experience and solve increasingly complex tasks. Yet most existing skill-learning methods store reusable experience as text-only assets, such as instructions, reasoning traces, or summarized trajectories. We argue that this text-only paradigm creates a fundamental bottleneck for visual-centric tasks, where reusable knowledge often depends on...

arXiv CS 8d ago

When and How Much to Imagine: Adaptive Test-Time Scaling with World Models for Visual Spatial Reasoning

arXiv:2602.08236v2 Announce Type: replace Abstract: Despite rapid progress in MLLMs, visual spatial reasoning remains unreliable when correct answers depend on how a scene would appear under unseen or alternative viewpoints. Recent work addresses this by augmenting reasoning with world models for visual imagination, but questions such as when imagination is actually necessary, how much of it is beneficial, and when it becomes harmful, remain poorly understood.

arXiv CS 8d ago

Order Matters: Unveiling the Hidden Impact of Macro Placement Sequences via Proxy-Guided LLM Evolution

Announce Type: new Abstract: Macro placement is a fundamental step in modern chip physical design, playing a crucial role in determining the solution quality of high-dimensional combinatorial optimization problems. Despite recent advancements in machine learning for spatial coordinate determination, the temporal dimension of placement sequencing remains largely governed by static heuristics. In this work, we demonstrate that the placement sequence is not merely a preprocessing step but a...

arXiv CS 1d ago

DRAN: A Distribution and Relation Adaptive Network for Spatio-temporal Forecasting

arXiv:2504.01531v4 Announce Type: replace Abstract: Accurate predictions of spatio-temporal systems are crucial for tasks such as system management, control, and crisis prevention. However, the inherent time variance of many spatio-temporal systems poses challenges to achieving accurate predictions whenever stationarity is not granted. In order to address non-stationarity, we propose a Distribution and Relation Adaptive Network (DRAN) capable of dynamically adapting to relation and...

arXiv CS 7d ago

TROPHIES: Temporal Reconstruction of Places, Humans, and Cameras from Multi-view Videos

arXiv:2606.02350v1 Announce Type: new Abstract: Reconstructing humans and their surrounding environments in a globally consistent 4D space is essential for comprehensive perception. However, prior works typically assume single-view inputs or decouple humans, scenes, and cameras, making them unable to recover coherent geometry, stable motion, and physically aligned trajectories. These limitations motivate us to introduce a new task: unified human-scene-camera reconstruction from multi-view...

arXiv CS 8d ago