Home › Knowledge Base › 3D

3D

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

3DThinkVLA: Endowing Vision-Language-Action Models with Latent 3D Priors via 3D-Thinking-Guided Co-training

Announce Type: new Abstract: We propose a 3D-thinking-guided co-training framework that enables vision-language-action (VLA) models to perform 3D spatial reasoning implicitly during action prediction. Our core insight is that 3D geometry perception and 3D spatial reasoning are distinct capabilities that can be disentangled and injected at different feature hierarchies. During training, three tightly coupled components work in concert primarily within the latent space: (1) To gain geometric...

arXiv CS 6d ago

Skill-3D: Evolving Scene-Aware Skills for Agentic 3D Spatial Reasoning

arXiv:2606.07436v1 Announce Type: new Abstract: This paper explores agentic 3D spatial understanding, i.e., MLLM agents performing 3D reasoning through tool use. Existing methods often misuse tools and exhibit biased tool preferences under 3D scenarios, leaving the agentic paradigm with only marginal gains over non-agentic strategies. We reveal that 3D spatial reasoning tasks are heterogeneous across scenes, while these agents apply a uniform tool-use strategy to all scenes rather than...

arXiv CS 2d ago

Controllable Dynamic 3D Shape Generation via 3D Trajectories and Text

Announce Type: new Abstract: We introduce T2Mo, a feed-forward framework for controllable dynamic 3D shape generation conditioned on 3D trajectories and text. Due to the inherent ambiguity of language, generating precisely intended motions using text alone remains challenging. To address this, we adopt 3D trajectories as controllable spatial guidance, specifying the exact paths along which selected points should move.

arXiv CS 6d ago

CRAG: Can 3D Generative Models Help 3D Assembly?

Announce Type: replace Abstract: Most existing 3D assembly methods treat the problem as pure pose estimation, rearranging observed parts via rigid transformations. In contrast, human assembly naturally couples structural reasoning with holistic shape inference. Inspired by this intuition, we reformulate 3D assembly as a joint problem of assembly and generation.

arXiv CS 1d ago

3D-Belief: Embodied Belief Inference via Generative 3D World Modeling

arXiv:2605.11367v2 Announce Type: replace Abstract: Recent advances in visual generative models have highlighted the promise of learning generative world models. However, most existing approaches frame world modeling as novel-view synthesis or future-frame prediction, emphasizing visual realism rather than the structured uncertainty required by embodied agents acting under partial observability. In this work, we propose a different perspective: world modeling as embodied belief inference in...

arXiv CS 9d ago

PAR3D: A Unified 3D-MLLM with Part-Aware Representation for Scene Understanding

new Abstract: Recent advances in 3D multimodal large language models (3D-MLLMs) have enabled unified solutions for 3D scene understanding tasks, including visual question answering, captioning, and referring segmentation. However, existing 3D-MLLMs remain largely object-centric, limiting their ability to model fine-grained part structures that are essential for embodied interaction with 3D environments. In this work, we present PAR3D, a unified part-aware 3D-MLLM framework that enables...

arXiv CS 5d ago

Unified Semantic Transformer for 3D Scene Understanding

arXiv:2512.14364v3 Announce Type: replace Abstract: Holistic 3D scene understanding involves capturing and parsing unstructured 3D environments. Due to the inherent complexity of the real world, existing models have predominantly been developed and limited to be task-specific. We introduce UNITE, a Unified Semantic Transformer for 3D scene understanding, a novel feed-forward neural network that unifies a diverse set of 3D dense semantic indoor tasks within a single model.

arXiv CS 8d ago

iLRM: An Iterative Large 3D Reconstruction Model

arXiv:2507.23277v3 Announce Type: replace Abstract: Feed-forward 3D modeling has emerged as a promising approach for rapid and high-quality 3D reconstruction. In particular, directly generating explicit 3D representations, such as 3D Gaussian splatting, has attracted significant attention due to its fast and high-quality rendering. However, many state-of-the-art methods, primarily based on transformer architectures, suffer from severe scalability issues because they rely on full attention...

arXiv CS 8d ago

Distilling Neuro-Symbolic Programs into 3D Multi-modal LLMs

arXiv:2606.01215v1 Announce Type: new Abstract: Current 3D spatial reasoning methods face a fundamental trade-off: neuro-symbolic 3D (NS3D) concept learners achieve interpretable reasoning through compositional programs but are constrained to closed-set concept vocabularies and simple programs; end-to-end 3D multi-modal LLMs (3D MLLMs) could handle complex natural language and open-vocabulary concepts but suffer from black-box reasoning without explicit spatial verification. We introduce...

arXiv CS 8d ago

SAM 3D: 3Dfy Anything in Images

arXiv:2511.16624v2 Announce Type: replace Abstract: We present SAM 3D, a generative model for visually grounded 3D object reconstruction, predicting geometry, texture, and layout from a single image. SAM 3D excels in natural images, where occlusion and scene clutter are common and visual recognition cues from context play a larger role. We achieve this with a human- and model-in-the-loop pipeline for annotating object shape, texture, and pose, providing visually grounded 3D reconstruction...

arXiv CS 6d ago