Home › Knowledge Base › VGGT

VGGT

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

QVGGT: Post-Training Quantized Visual Geometry Grounded Transformer

Announce Type: new Abstract: Estimating 3D attributes directly from images has advanced rapidly with the Visual Geometry Grounded Transformer (VGGT), which predicts camera parameters, depth maps, and point clouds in a single forward pass. However, its 1.2B-parameter scale severely limits deployment on resource-constrained platforms such as UAVs and mobile AR devices. To address this limitation, we introduce QVGGT, a tailored quantization framework designed to compress VGGT.

arXiv CS 9d ago

WristCompass: Kinematic Coupling as a Learnable Visual Concept for Ego-Camera Orientation

arXiv:2605.30671v1 Announce Type: new Abstract: Recovering ego-camera orientation from manipulation video is a prerequisite for disentangling hand motion from camera motion, a key step in imitation learning from egocentric demonstrations. The obvious approach, inferring orientation from scene geometry, fails when hands occlude the frame: VGGT, a 1B-parameter scene reconstruction model, scores worse than a constant predictor on the TACO benchmark. We identify an alternative visual concept...

arXiv CS 9d ago

Unpaired RGB-Thermal Gaussian-Splatting Using Visual Geometric Transformers

arXiv:2606.05491v1 Announce Type: new Abstract: Multi-modal novel view synthesis (NVS) combining RGB and thermal imagery enables precise 3D scene reconstruction with visual and thermal information. However, existing methods typically rely on precisely calibrated RGB-thermal image pairs or stereo setups, limiting scalability and practical deployment.

arXiv CS 5d ago

CamGeo: Sparse Camera-Conditioned Image-to-Video Generation with 3D Geometry Priors

arXiv:2605.30895v2 Announce Type: replace Abstract: Sparse camera-conditioned image-to-video generation presents a pivotal challenge: synthesizing geometrically consistent 3D motion from minimal pose cues. Existing methods, which largely rely on dense supervision or naive interpolation, suffer from severe pose drift and motion discontinuities due to the lack of robust 3D priors. In this paper, we introduce CamGeo, a novel framework that distills rich 3D geometric knowledge from a pre-trained...

arXiv CS 8d ago

CamGeo: Sparse Camera-Conditioned Image-to-Video Generation with 3D Geometry Priors

Announce Type: new Abstract: Sparse camera-conditioned image-to-video generation presents a pivotal challenge: synthesizing geometrically consistent 3D motion from minimal pose cues. Existing methods, which largely rely on dense supervision or naive interpolation, suffer from severe pose drift and motion discontinuities due to the lack of robust 3D priors. In this paper, we introduce CamGeo, a novel framework that distills rich 3D geometric knowledge from a pre-trained video-to-3D model...

arXiv CS 9d ago