AlloSpatial: Agentic Harness Framework for Spatial Reasoning in Foundation Models

arXiv CS Tuesday 09 June 2026, 04:00 UTC By Shouwei Ruan, Bin Wang, Zhenyu Wu, Qihui Zhu, Yuxiang Zhang, Jingzhi Li, Yubin Wang, Xingxing Wei 1 min read

Key Points

arXiv:2606.08952v1 Announce Type: new Abstract: Multimodal Foundation Models (MFMs) have made substantial progress, yet remain fragile in spatial reasoning over the physical world. A key bottleneck lies in their inability to transform local egocentric observations into a global allocentric spatial representation. To address this, we propose AlloSpatial, an agentic framework for allocentric spatial cognition in foundation models. AlloSpatial introduces World2Mind, a plug-and-play cognitive mapping sandbox that converts egocentric observations into structured allocentric priors, including Allocentric-Spatial Trees and route maps that support querying object topology, geometric relations, passability, and trajectories. To utilize these priors reliably under noisy reconstruction and ambiguous visual evidence, AlloSpatial introduces a Spatial Reasoning Harness for tool-use judgment, modality-decoupled cue collection, and geometry-semantic arbitration. We further internalize this process in Qwen3-VL through cold-start reinforcement learning with a harness-gated trajectory-level reward. Experiments on VSI-Bench and MindCube show that AlloSpatial improves proprietary models by 5%-18% in a training-free setting, while ASTs alone support strong spatial reasoning even when visual inputs are removed. The trained AlloSpatial agents further outperform larger general-purpose models and competitive spatial baselines, suggesting that structured allocentric representations, active tool use, and verifiable reasoning offer a promising route toward spatially capable foundation models.

AlloSpatial (ORG) World2Mind (PERSON) Allocentric-Spatial Trees (ORG) Spatial Reasoning Harness (ORG) VSI-Bench (ORG)

Originally published by arXiv CS Read original →

AlloSpatial: Agentic Harness Framework for Spatial Reasoning in Foundation Models

Related Stories

Weight-loss drug users save over £400 a year on grocery bills as take-up triples

Weight-loss drug users save over £400 a year on grocery bills as take-up triples

MIT’s new spacecraft engine could send tiny satellites to Mars

Bitcoin Selloff Leaves Half of All Supply Trading at a Loss