Home › Knowledge Base › Multi-View 3D

Multi-View 3D

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

DisPOSE: Projected Polystochastic Diffusion for Self-Supervised Multi-View 3D Human Pose Estimation

arXiv:2606.07419v1 Announce Type: new Abstract: Recovering 3D human poses for multiple individuals from different camera views is a fundamental bottleneck for analyzing interacting behaviors. Existing self-supervised approaches leverage synthetic catalogues of 3D poses; however, this leads to poor generalization in real-world scenarios due to distribution shifts. We therefore introduce DisPOSE, a self-supervised framework that approximates the inherently discrete multi-view person-assignment...

arXiv CS 2d ago

Unsupervised Monocular 3D Keypoint Discovery from Multi-View Diffusion Priors

arXiv:2507.12336v2 Announce Type: replace Abstract: Most existing 3D keypoint estimation methods rely on manual annotations or calibrated multi-view images, both of which are expensive to collect. This paper introduces KeyDiff3D, a framework that can accurately predict 3D keypoints from a single image, thus eliminating the need for such expensive data acquisitions. To achieve this, we leverage powerful geometric priors embedded in a pretrained multi-view diffusion model.

arXiv CS 5d ago

Stream3D: Sequential Multi-View 3D Generation via Evidential Memory

arXiv:2605.21472v2 Announce Type: replace Abstract: View-conditioned 3D generators such as SAM 3D, TRELLIS, and Hunyuan3D produce high-quality object reconstructions from a single view, but real-world visual observation often arrives as long monocular streams. Naively applying these generators to each streaming frame independently leads to severe temporal inconsistency in the generated results. To address this problem, we propose Stream3D, the first training-free streaming mechanism that...

arXiv CS 9d ago

Multi-view Pyramid Transformer: Look Coarser to See Broader

arXiv:2512.07806v2 Announce Type: replace Abstract: We propose Multi-view Pyramid Transformer (MVP), a scalable multi-view transformer architecture that directly reconstructs large 3D scenes from tens to hundreds of images in a single forward pass. Drawing on the idea of ``looking broader to see the whole, looking finer to see the details," MVP is built on two core design principles: 1) a local-to-global inter-view hierarchy that gradually broadens the model's perspective from local views to...

arXiv CS 8d ago

BEAST3D: Animal behavioral analysis and neural encoding from multi-view video via Gaussian splatting

arXiv:2606.02937v1 Announce Type: cross Abstract: Multi-view video recordings are increasingly used to capture the 3D movements of animals in experimental settings, yet extracting rich 3D representations from these recordings remains challenging. Supervised pose estimation requires extensive manual annotation, while general-purpose 3D reconstruction models trained on generic scene datasets fail on the specialized imagery and sparse-view setting of laboratory experiments. We address these...

arXiv CS 7d ago

DisPOSE: Projected Polystochastic Diffusion for Self-Supervised Multi-View 3D Human Pose Estimation

arXiv:2606.07419v2 Announce Type: replace Abstract: Recovering 3D human poses for multiple individuals from different camera views is a fundamental bottleneck for analyzing interacting behaviors. Existing self-supervised approaches leverage synthetic catalogues of 3D poses; however, this leads to poor generalization in real-world scenarios due to distribution shifts.

arXiv CS 1d ago

ViewMask-1-to-3: Multi-View Consistent Image Generation via Multimodal Discrete Diffusion Models

arXiv:2512.14099v3 Announce Type: replace Abstract: Motivated by discrete diffusion's success in language-vision modeling, we explore its potential for multi-view generation, a task dominated by continuous approaches. We introduce ViewMask-1-to-3, formulating multi-view generation as a discrete sequence modeling problem where each viewpoint is represented as visual tokens from MAGVIT-v2. Through discrete diffusion via masked token prediction, our approach enables progressive multi-view...

arXiv CS 6d ago

Stream3D: Sequential Multi-View 3D Generation via Evidential Memory

arXiv:2605.21472v3 Announce Type: replace Abstract: View-conditioned 3D generators such as SAM 3D, TRELLIS, and Hunyuan3D produce high-quality object reconstructions from a single view, but real-world visual observation often arrives as long monocular streams. Naively applying these generators to each streaming frame independently leads to severe temporal inconsistency in the generated results. To address this problem, we propose Stream3D, the first training-free streaming mechanism that...

arXiv CS 2d ago

D\'ej\`a View: Looping Transformers for Multi-View 3D Reconstruction

arXiv:2605.30215v2 Announce Type: replace Abstract: Recent feed-forward 3D reconstruction transformers have scaled to over a billion parameters, following the broader trend of increasing model capacity in computer vision. Yet emerging evidence suggests that contiguous transformer layers often behave like repeated applications of similar operations, and multi-view reconstruction transformers refine their predictions progressively across decoder depth. We posit that model depth partially buys...

arXiv CS 9d ago

COMPOSE: Hypergraph Cover Optimization for Multi-view 3D Human Pose Estimation

Announce Type: replace Abstract: 3D human pose estimation from sparse multi-view camera rigs is an essential task for numerous applications, including action recognition, sports analysis, and human-robot interaction. While learned methods dominate the field on benchmarks, they require large annotated datasets; training-free optimization-based methods remain promising as they circumvent 3D supervision by solving a correspondence problem across views from 2D detections. Existing combinatorial...

arXiv CS 2d ago