Home › Knowledge Base › the Visual Geometry Grounded Transformer

the Visual Geometry Grounded Transformer

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

QVGGT: Post-Training Quantized Visual Geometry Grounded Transformer

Announce Type: new Abstract: Estimating 3D attributes directly from images has advanced rapidly with the Visual Geometry Grounded Transformer (VGGT), which predicts camera parameters, depth maps, and point clouds in a single forward pass. However, its 1.2B-parameter scale severely limits deployment on resource-constrained platforms such as UAVs and mobile AR devices. To address this limitation, we introduce QVGGT, a tailored quantization framework designed to compress VGGT.

arXiv CS 9d ago

$\text{VG}^2$GT: Voxel-Gaussian Splatting Visual Geometry Grounded Transformer

Announce Type: replace Abstract: Gaussian splatting has shown strong potential for 3D reconstruction and novel view synthesis. However, most existing methods require accurate camera parameters and per-scene optimization, while feed-forward methods with pixel-aligned Gaussian primitives often suffer from artifacts and non-uniform primitives. In this paper, we propose $\text{VG}^2$GT, a Voxel-Gaussian Splatting Visual Geometry-Grounded Transformer.

arXiv CS 6d ago

$\text{VG}^2$GT: Voxel-Gaussian Splatting Visual Geometry Grounded Transformer

Announce Type: new Abstract: Gaussian splatting has shown strong potential for 3D reconstruction and novel view synthesis. However, most existing methods require accurate camera parameters and per-scene optimization, while feed-forward methods with pixel-aligned Gaussian primitives often suffer from artifacts and non-uniform primitives. In this paper, we propose $\text{VG}^2$GT, a Voxel-Gaussian Splatting Visual Geometry-Grounded Transformer.

arXiv CS 8d ago

LiAuto-GeoX: Efficient Grounded Driving Transformer

Announce Type: new Abstract: Dense 3D reconstruction has demonstrated immense potential for spatial understanding, yet its viability as a real-time, onboard representation for autonomous driving remains an open challenge. Existing large-scale visual geometry models typically require substantial computational resources and lack the long-range geometric fidelity, surround-view consistency, and real-time efficiency demanded by dynamic driving environments. To bridge this gap, we present...

arXiv CS 5d ago

Probing Collision Grounding in Vision-Language Models for Safe Human-Robot Collaboration

arXiv:2605.31196v1 Announce Type: new Abstract: Safe human--robot collaboration requires more than visual description: a monitor must determine whether the robot body is safely separated, already colliding with the scene or a person, or about to collide. We call this capability collision grounding: binding visual observations to robot body geometry, camera viewpoint, scene layout, human proximity, and temporal motion in order to infer present and imminent contact. We introduce...

arXiv CS 9d ago