Home Knowledge Base VBench Quality

VBench Quality

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

Causal Forcing++: Scalable Few-Step Autoregressive Diffusion Distillation for Real-Time Interactive Video Generation

arXiv:2605.15141v3 Announce Type: replace Abstract: Real-time interactive video generation requires low-latency, streaming, and controllable rollout. Existing autoregressive (AR) diffusion distillation methods have achieved strong results in the chunk-wise 4-step regime by distilling bidirectional base models into few-step AR students, but they remain limited by coarse response granularity and non-negligible sampling latency. In this paper, we study a more aggressive setting: frame-wise...

arXiv CS 8d ago

Causal Forcing++: Scalable Few-Step Autoregressive Diffusion Distillation for Real-Time Interactive Video Generation

arXiv:2605.15141v2 Announce Type: replace Abstract: Real-time interactive video generation requires low-latency, streaming, and controllable rollout. Existing autoregressive (AR) diffusion distillation methods have achieved strong results in the chunk-wise 4-step regime by distilling bidirectional base models into few-step AR students, but they remain limited by coarse response granularity and non-negligible sampling latency. In this paper, we study a more aggressive setting: frame-wise...

arXiv CS 9d ago

DSA: Dynamic Step Allocation for Fast Autoregressive Video Generation

arXiv:2606.04432v1 Announce Type: new Abstract: Video diffusion transformers have achieved state-of-the-art visual quality, but their high inference cost remains a major bottleneck for real-time applications. Recent distillation frameworks produce autoregressive video diffusion models with reduced latency, yet these models still use a fixed number of denoising steps per frame, wasting computation on predictable frames and under-refining challenging ones. We present DSA, a confidence-guided...

arXiv CS 6d ago

LVSA: Training-Free Sparse Attention for Long Video Diffusion

arXiv:2605.31057v1 Announce Type: new Abstract: Dense self-attention is the compute and quality bottleneck of long-video diffusion inference: cost grows quadratically with the sequence length, and beyond the training horizon the model converges to near-static output, that is, "frozen" repetitive video. State of the art approaches are either too costly, e.g., they require retraining, or fail to satisfy both performance and quality objectives in a scalable manner. To this end, we introduce...

arXiv CS 9d ago

LongLive-RAG: A General Retrieval-Augmented Framework for Long Video Generation

arXiv:2606.02553v1 Announce Type: new Abstract: Autoregressive (AR) video diffusion enables variable-length synthesis, but long-horizon generation often suffers from accumulated errors and identity drift. For efficiency, existing methods commonly adopt sliding-window attention during generation. This creates an irreversible generation trajectory: once the active window accumulates appearance errors, subsequent generations can only condition on this degraded trajectory and drift further away.

arXiv CS 8d ago

Physics-Informed Video Generation via Mixture-of-Experts Latent Alignment

arXiv:2606.04737v1 Announce Type: new Abstract: Large-scale video generation models have made remarkable progress in semantic consistency and visual quality, producing videos that are increasingly coherent and visually convincing. Nevertheless, the dynamics induced by pixel-level fitting do not naturally accommodate the regularities that govern real-world motion and interaction, resulting in persistent shortcomings in physical plausibility. To address this limitation, we propose...

arXiv CS 6d ago

Lumos-Nexus: Efficient Frequency Bridging with Homogeneous Latent Space for Video Unified Models

arXiv:2605.31603v1 Announce Type: new Abstract: Connector-based video unified models have demonstrated strong capability in instruction-grounded video synthesis, but integrating a large high-fidelity generator into the unified training loop is computationally prohibitive, limiting achievable visual quality. We therefore propose Lumos-Nexus, a training-efficient unified video generation framework that facilitates the development of strong reasoning-driven generation capabilities while...

arXiv CS 9d ago

ReCache: Learning Budget-Aware Caching Schedules for Diffusion Models via REINFORCE

Announce Type: new Abstract: Modern diffusion models generate high-quality images and videos, but their iterative denoising process makes inference expensive. Feature caching accelerates sampling by reusing or predicting intermediate activations across neighboring denoising steps, exploiting the redundancy of computations along the reverse trajectory. In this work, we focus on the caching schedule: selecting which denoising steps should be fully recomputed.

arXiv CS 5d ago