Monocular Videos
No mentions found
This entity hasn't been tracked yet, or Iris is still building its knowledge base.
Related Articles from SNS
DanceHMR: Hand-Aware Whole-Body Human Mesh Recovery from Monocular Videos
Announce Type: replace Abstract: Monocular video human mesh recovery is essential for digital humans, avatar animation, and embodied simulation, where both temporal stability and expressive whole-body motion are required. Existing video HMR methods produce coherent body motion but often overlook detailed hand articulation, while image-based whole-body methods recover SMPL-X meshes independently per frame, often leading to jittery and inaccurate hand motion. We present a temporally coherent...
WebSpline: Structure-Informed Splines for Real-Time 3D Gaussians from Monocular Videos
arXiv:2606.02096v1 Announce Type: new Abstract: Dynamic scene reconstruction from monocular videos remains highly challenging, as existing methods often struggle to balance global structural coherence and local fine-grained details under limited multi-view cues. To address this challenge, we propose WebSpline, a novel dynamic 3D Gaussian framework that enables structurally coherent and high-fidelity reconstruction from monocular videos with fast rendering. The core of WebSpline is the...
From Pixels to Newtons: Predicting In Vivo Joint Contact Forces from Monocular Video
Announce Type: new Abstract: Joint contact forces govern implant longevity, cartilage health, and rehabilitation outcomes, shaping who develops osteoarthritis, who recovers well from joint replacement, and who benefits from biomechanical interventions. Yet they remain measurable only invasively, in a few dozen patients with instrumented implants. I present a physics-free pipeline to predict instantaneous 3D hip and knee contact forces from an uncalibrated monocular video: no markers, force...
Recovering Physically Plausible Human-Object Interactions from Monocular Videos
Announce Type: new Abstract: In this paper, we propose RePHO, a method to reconstruct physically plausible human-object interactions (HOI) from monocular videos. While existing kinematic-based approaches produce visually plausible motion, they often result in physically implausible artifacts such as interpenetration and object floating. To overcome these issues, we introduce a physics-guided reconstruction framework.
Tamaththul3D: High-Fidelity 3D Saudi Sign Language Avatars from Monocular Video
arXiv:2605.05367v2 Announce Type: replace Abstract: Existing 3D sign language avatar reconstruction methods are developed and evaluated exclusively on Western sign languages, and no 3D parametric annotations exist for any Arabic Sign Language dataset, a gap that blocks the development of avatar-based accessibility applications for the Arab Deaf community. We release the first SMPL-X parametric annotations for the Ishara-500 Saudi Sign Language dataset, enabling quantitative evaluation and...
CHOIR: Contact-aware 4D Hand-Object Interaction Reconstruction
Announce Type: replace Abstract: We ask whether everyday open-world monocular videos can be turned into reusable 4D interaction primitives: articulated hand motion, object shape with 6D pose over time, and the when/where of contact. Such a capability would enable scalable mining of real interactions and, beyond reconstruction, support scene-aware synthesis and planning. However, reconstructing hand-object interaction (HOI) from challenging monocular videos remains difficult: methods often...
Princeton365: A Diverse Dataset with Accurate Camera Pose
arXiv:2506.09035v2 Announce Type: replace Abstract: We introduce Princeton365, a large-scale diverse dataset of 365 videos with accurate camera pose. Our dataset bridges the gap between accuracy and data diversity in current SLAM benchmarks by introducing a novel ground truth collection framework that leverages calibration boards and a 360-camera. We collect indoor, outdoor, and object scanning videos with synchronized monocular and stereo RGB video outputs as well as IMU.
Dash2Sim: Closed-Loop Driving Simulation from in-the-wild Dashcam Videos
Announce Type: new Abstract: Self-driving simulations typically rely on data collected in a small number of cities or on hand-authored synthetic scenarios. Dashcam videos cover a far broader range of locations and situations, including rare or long-tailed scenarios. They are considered less usable for simulation because it is difficult to recover accurate 4D scenes from monocular in-the-wild videos.
AGILE: Hand-Object Interaction Reconstruction from Video via Agentic Generation
arXiv:2602.04672v4 Announce Type: replace Abstract: Reconstructing dynamic hand-object interactions from monocular videos is critical for dexterous manipulation data collection and creating realistic digital twins for robotics and VR. However, current methods face two prohibitive barriers: (1) reliance on neural rendering often yields fragmented, non-simulation-ready geometries under heavy occlusion, and (2) dependence on brittle Structure-from-Motion (SfM) initialization leads to frequent...
Learning Global Motion with Compact Gaussians for Feed-Forward 4D Reconstruction
arXiv:2605.31595v1 Announce Type: new Abstract: Dynamic scene reconstruction from monocular video remains a fundamental challenge in computer vision. Existing feed-forward methods predict 3D Gaussians pixel-wise for each frame, suffering from duplicated Gaussians and view-dependent biases that hinder effective learning of scene motion. We present C4G, a feed-forward 4D reconstruction framework built upon a compact set of timestamp-conditioned learnable Gaussian query tokens.