Home Science Unsupervised Monocular 3D Keypoint Discovery from...
Science

Unsupervised Monocular 3D Keypoint Discovery from Multi-View Diffusion Priors

Key Points

arXiv:2507.12336v2 Announce Type: replace Abstract: Most existing 3D keypoint estimation methods rely on manual annotations or calibrated multi-view images, both of which are expensive to collect. This paper introduces KeyDiff3D, a framework that can accurately predict 3D keypoints from a single image, thus eliminating the need for such expensive data acquisitions. To achieve this, we leverage powerful geometric priors embedded in a pretrained multi-view diffusion model.

arXiv:2507.12336v2 Announce Type: replace Abstract: Most existing 3D keypoint estimation methods rely on manual annotations or calibrated multi-view images, both of which are expensive to collect. This paper introduces KeyDiff3D, a framework that can accurately predict 3D keypoints from a single image, thus eliminating the need for such expensive data acquisitions. To achieve this, we leverage powerful geometric priors embedded in a pretrained multi-view diffusion model. In our framework, the diffusion model generates multi-view images from a single image, serving as supervision signals to provide 3D geometric cues to our model. We also introduce a 3D feature extractor that transforms implicit 3D priors embedded in the diffusion features into explicit 3D feature volumes. Beyond accurate keypoint estimation, we further introduce a pipeline that enables manipulation of 3D objects generated by the diffusion model. Experimental results on diverse datasets, including Human3.6M, CUB-200-2011, Stanford Dogs, and several in-the-wild and out-of-domain inputs, highlight the effectiveness of our method in terms of accuracy, generalization, and its ability to enable manipulation of 3D objects generated by the diffusion model from a single image.
Unsupervised Monocular (PERSON) Keypoint Discovery (PERSON) Multi-View Diffusion Priors arXiv:2507.12336v2 (ORG) KeyDiff3D (ORG) Human3.6M (ORG) Stanford Dogs (ORG)
Originally published by arXiv CS Read original →