Home › Knowledge Base › Spatial Reasoning Harness

Spatial Reasoning Harness

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

AlloSpatial: Agentic Harness Framework for Spatial Reasoning in Foundation Models

arXiv:2606.08952v1 Announce Type: new Abstract: Multimodal Foundation Models (MFMs) have made substantial progress, yet remain fragile in spatial reasoning over the physical world. A key bottleneck lies in their inability to transform local egocentric observations into a global allocentric spatial representation. To address this, we propose AlloSpatial, an agentic framework for allocentric spatial cognition in foundation models.

arXiv CS 1d ago

Enhancing Spatial Reasoning in Large Language Models for Metal-Organic Frameworks Structure Prediction

arXiv:2601.09285v2 Announce Type: replace Abstract: Metal-organic frameworks (MOFs) are porous crystalline materials with broad applications such as carbon capture and drug delivery, yet accurately predicting their 3D structures remains a significant challenge. While Large Language Models (LLMs) have shown promise in generating crystal structures, their application to MOFs is hindered by MOFs' high structural complexity arising from the large number of atoms in unit cell. Inspired by the...

arXiv CS 1d ago

MUSE: A Unified Agentic Harness for MLLMs

Announce Type: new Abstract: Despite rapid progress, multimodal large language models (MLLMs) still fail on tasks that humans solve effortlessly, such as navigating a grid maze from a screenshot or selecting the correct puzzle piece. Rather than retraining the model, we ask a complementary question: how much capability can be elicited from a frozen MLLM purely by improving the execution scaffold around it? We introduce MUSE, a multimodal unified structured execution harness that wraps any...

arXiv CS 7d ago

Enginuity: A Dataset and Benchmark for Vision-Language Understanding of Engineering Diagrams

Announce Type: new Abstract: Engineering diagrams pose a distinct challenge for vision-language models: unlike natural images or general documents, they encode information through dense spatial layouts, domain-specific symbols, and cross-references between visual callouts and structured parts tables. Despite their centrality to service, repair, and design workflows, there is no public benchmark for measuring VLM capabilities in this domain; existing datasets primarily focus on flowcharts,...

arXiv CS 7d ago