Home Knowledge Base Classifier-Free Guidance

Classifier-Free Guidance

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

GuidedBridge: Training-freely Improving Bridge Models with Prior Guidance

arXiv:2606.03119v1 Announce Type: new Abstract: Guidance methods, such as classifier-free guidance (CFG) and auto-guidance (AG), have advanced noise-to-data generation in diffusion models. Recently, bridge models have introduced a data-to-data generative process that can exploit an instructive clean prior. In this work, inspired by previous methods creating quality difference between denoising results as guidance, we propose a training-free bridge guidance method, termed Prior Guidance (PG).

arXiv CS 7d ago

DTG-Restore: Training-Free Diffusion Refinement for Generative Video Super-Resolution

arXiv:2605.30431v1 Announce Type: new Abstract: Recent progress in video diffusion models has enabled remarkable generative fidelity, yet leveraging these priors for restoration remains limited by the strong coupling between conditional and unconditional branches in standard classifier-free guidance. We introduce a training-free framework that enhances distorted and low-resolution videos by decoupling these signals in time. Our proposed Decoupled Time Guidance (DTG) evaluates the...

arXiv CS 9d ago

Magenta RealTime 2: Open and Local Live Music Models

We’re excited to share Magenta RealTime 2 (MRT2), a state-of-the-art open model and efficient real-time inference engine that enables you to build and play AI musical instruments on your laptop! To get started, download the apps on your MacBook (requires Apple Silicon). Unlike other large generative music models that work offline to turn a prompt into a track, MRT2 is a live, interactive model that you can control with MIDI and audio, in addition to text.

Hacker News 5d ago

LUCID: Learning Unified Control for Image Deflaring and Exposure Mastery in Nighttime Photography

arXiv:2606.06901v1 Announce Type: new Abstract: Photography is the art of painting with light, yet nighttime scenes are shaped by competing degradations: intense flares obscure scene structure, while photon-limited regions collapse into noise. Conventional approaches address these factors in isolation, overlooking the fact that these degradations are fundamentally entangled.

arXiv CS 2d ago

Guidance for Low-Level Perceptual Editing in Unconditional Diffusion Models

Announce Type: new Abstract: Unconditional diffusion models offer powerful generative priors, yet steering them toward aesthetically enhanced outputs remains largely unexplored. We show that h-space patching, the dominant paradigm for training-free diffusion editing, systematically fails for global, low-level transformations required for aesthetic and perceptual refinement. We introduce a novel, generalized framework for image-editing in unconditional diffusion models without explicit training.

arXiv CS 9d ago

Bionic Human-Motion Style Transfer for Physically Executable Whole-Body Control of Humanoid Robots

arXiv:2606.03536v1 Announce Type: new Abstract: Expressive whole-body motion is important for humanoid robots operating in human environments, where robots are expected to move stably while presenting readable and adjustable body behaviors. However, most expressive motions are still obtained from fixed demonstrations or manually designed scripts, making it difficult to reuse a demonstrated style across different motion contents. Inspired by the way human motion styles convey affective and...

arXiv CS 7d ago

Optimal Transport Flow Matching by Design

arXiv:2606.04092v1 Announce Type: new Abstract: Flow matching models learn to transport samples from a simple prior distribution to a complex data distribution. When prior-data pairs are coupled via optimal transport (OT), the learned trajectories are straight and non-crossing, enabling fast, even single-step, generation. However, computing the OT coupling in high dimensions is intractable, and existing methods attempt to solve the OT problem, at the cost of persistent bias or significant...

arXiv CS 6d ago

Spectral-Progressive Thought Flow for Lightweight Multimodal Reasoning

Announce Type: new Abstract: Multimodal spatial reasoning often relies on long chains of intermediate textual and visual thoughts, where accumulating visual tokens and dense cross-modal attention incur substantial computation and memory overhead. To address this challenge, we propose Spectral-Progressive Thought Flow (SpecFlow), a novel lightweight multimodal spatial reasoning framework that represents intermediate visual thoughts in a fixed-size discrete cosine space. By exploiting strong...

arXiv CS 7d ago

Diffusion-Based Heart Sound Generation: Evaluation with Physiological Signal Metrics, Classifiers, and Expert Listening

arXiv:2606.02448v1 Announce Type: cross Abstract: Publicly available phonocardiogram (PCG) datasets remain limited in size and pathological diversity, constraining both auscultation training and the generalisation of automated heart-sound classifiers. A class-conditional diffusion model for PCG generation is developed in the log-mel domain and synthetic fidelity is assessed using complementary (i) physiology-inspired plausibility metrics, (ii) downstream label-consistency evaluation, and...

arXiv CS 8d ago

Spatial-Temporal Decoupled Reference Conditioning for Identity-Preserving Text-to-Video Generation

Announce Type: new Abstract: Identity-preserving video generation (IPVG) aims to synthesize high-fidelity videos that follow text prompts while faithfully preserving a reference identity. Despite recent progress, existing IPVG methods still struggle to balance high-level semantic control and low-level identity fidelity. To bridge this gap, we propose ST-DRC, an effective Spatial-Temporal Decoupled Reference Conditioning framework for identity-preserving text-to-video generation.

arXiv CS 8d ago