Home › Knowledge Base › Camera Control

Camera Control

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

CameraNoise: Enabling Faithful Camera Control in Video Diffusion through Geometry-Flow-Guided Noise Warping

Announce Type: new Abstract: Precise camera pose control is critical for video diffusion, yet maintaining geometric consistency remains a challenge. Existing methods that directly inject numerical camera parameters into the diffusion backbone often fail to bridge the gap between abstract coordinates and visual content, leading to structural distortions. To address this issue, we propose CameraNoise, a flow-to-noise warping method that encodes camera motion into a temporally coherent...

arXiv CS 9d ago

Prisma-World: Camera-Controllable Multi-Agent Video World Model

arXiv:2606.09507v1 Announce Type: new Abstract: Video world models have made rapid progress in generating controllable visual experiences, but most of them still simulate the world from a single observer. Extending such models to multiple agents raises a central challenge: if each agent's future state is generated independently, overlapping views may instantiate different versions of the same scene, leading to inconsistent objects, layouts, and appearances across agents. Conventional camera...

arXiv CS 1d ago

PostCam: Camera-Controllable Novel-View Video Generation with Query-Shared Cross-Attention

Announce Type: replace Abstract: We propose PostCam, a streamlined framework for novel-view video generation that achieves superior detail preservation and precise camera trajectory editing in dynamic scenes. Current methods often struggle with a trade-off between pose-based control, which lacks visual detail, and rendering-based guidance, which is overly sensitive to geometric accuracy. Despite recent hybrid attempts, achieving precise motion and visual consistency remains challenging due...

arXiv CS 9d ago

DisCo: World Models with Discrete Camera Motion Control

arXiv:2606.07967v1 Announce Type: new Abstract: Controllable video world models target interactive world exploration, where models must faithfully execute explicit action commands while preserving visual quality and temporal coherence. However, most existing approaches rely on continuous camera trajectories as action conditions, which often lead to unreliable action following, especially under complex motion sequences. In this work, we identify action representation entanglement as a key...

arXiv CS 1d ago

Multi-Robot Planning and Control from CCTV Camera Networks in a Real Warehouse

arXiv:2606.06762v1 Announce Type: new Abstract: Off-board control of mobile robots from cameras embedded in the environment offers a practical path to scalable autonomy, moving sensing and compute off the robots. We extend this idea from the single-robot case to coordinated fleets in a real warehouse, driving multiple robots with only a distributed CCTV network and edge compute. The system operates entirely in image space over an uncalibrated, pixel-wise topological camera graph, enabling...

arXiv CS 2d ago

Auteur: Language-Driven Cinematographic Framing for Human-Centric Video Generation

arXiv:2606.01900v1 Announce Type: new Abstract: Generative video models have achieved remarkable visual fidelity and temporal coherence, yet intentional camera control remains elusive. Existing frameworks treat camera motion as a byproduct of pixel synthesis, producing trajectories that are stochastic, spatially inconsistent, and indifferent to the human subject driving the scene. In this work, we present Auteur, a method for language-driven, human-centric camera framing in generative video.

arXiv CS 8d ago

Pantheon360: Taming Digital Twin Generation via 3D-Aware 360{\deg} Video Diffusion

Announce Type: replace Abstract: Generating complete digital twins from videos requires precise camera control, global scene coverage, and strict spatial-temporal consistency constraints that remain challenging for perspective video generators due to their limited field of view (FoV). Their narrow FoV forces long or multi-view trajectories, amplifying cross-view inconsistency and temporal drift.

arXiv CS 1d ago

Light Interaction: Training-Free Inference Acceleration for Interactive Video World Models

arXiv:2605.31158v1 Announce Type: new Abstract: Interactive video world models generate video chunk by chunk in response to user-controlled camera movements, enabling applications such as real-time game simulation, virtual scene navigation, and embodied AI training. However, scaling to long interactive trajectories is prohibitively expensive due to growing context memory, quadratic attention complexity, and repeated denoising steps. We present Light Interaction, a training-free inference...

arXiv CS 9d ago

Light Interaction: Training-Free Inference Acceleration for Interactive Video World Models

arXiv:2605.31158v2 Announce Type: replace Abstract: Interactive video world models generate video chunk by chunk in response to user-controlled camera movements, enabling applications such as real-time game simulation, virtual scene navigation, and embodied AI training. However, scaling to long interactive trajectories is prohibitively expensive due to growing context memory, quadratic attention complexity, and repeated denoising steps. We present Light Interaction, a training-free inference...

arXiv CS 1d ago

Bokeh Diffusion: Defocus Blur Control in Text-to-Image Diffusion Models

arXiv:2503.08434v5 Announce Type: replace Abstract: Recent advances in large-scale text-to-image models have revolutionized creative fields by generating visually captivating outputs from textual prompts; however, while traditional photography offers precise control over camera settings to shape visual aesthetics - such as depth-of-field via aperture - current diffusion models typically rely on prompt engineering to mimic such effects. This approach often results in crude approximations and...

arXiv CS 1d ago