Home Science Generative Drifting is Secretly Score Matching: a...
Science

Generative Drifting is Secretly Score Matching: a Spectral and Variational Perspective

Key Points

Announce Type: replace Abstract: Generative Modeling via Drifting~\citep{deng2026drifting} has recently achieved state-of-the-art one-step image generation through a kernel-based drift operator, yet its success is largely empirical and its theoretical foundations remain poorly understood. We observe that \emph{under a Gaussian kernel, the drift operator is exactly a score difference on smoothed distributions}. This answers three questions left open in the original work: (1) whether a...

arXiv:2603.09936v2 Announce Type: replace Abstract: Generative Modeling via Drifting~\citep{deng2026drifting} has recently achieved state-of-the-art one-step image generation through a kernel-based drift operator, yet its success is largely empirical and its theoretical foundations remain poorly understood. We observe that \emph{under a Gaussian kernel, the drift operator is exactly a score difference on smoothed distributions}. This answers three questions left open in the original work: (1) whether a vanishing drift guarantees equality of distributions ($V_{p,q}=0\Rightarrow p=q$), (2) how to choose between kernels, and (3) why the stop-gradient operator is indispensable for stable training. Our observations position drifting within the score-matching family. By linearizing the McKean-Vlasov dynamics and probing them in Fourier space, we reveal frequency-dependent convergence timescales comparable to \emph{Landau damping} in plasma kinetic theory: the Gaussian kernel suffers an exponential high-frequency bottleneck, potentially explaining the empirical preference for the Laplacian kernel. This suggests a fix: an exponential bandwidth annealing schedule $\sigma(t)=\sigma_0 e^{-rt}$ that reduces convergence time from $\exp(O(K_{\max}^2))$ to $O(\log K_{\max})$. Finally, by formalizing drifting as a Wasserstein gradient flow of the smoothed KL divergence, we prove that the stop-gradient operator is not a heuristic but is derived from the frozen-field discretization mandated by the Jordan-Kinderlehrer-Otto (JKO) scheme, and removing it severs training from any gradient-flow guarantee. This variational perspective further provides a general template for constructing novel drift operators, which we demonstrate with a Sinkhorn divergence drift. We validate our analysis on toy datasets and scale it up to ImageNet.
Spectral (ORG) the McKean-Vlasov (LOCATION) Fourier (ORG) Laplacian (PERSON) e^{-rt}$ (ORG) K_{\max})$. (ORG) Wasserstein (LOCATION) JKO (ORG) ImageNet (ORG)
Originally published by arXiv CS Read original →