Reinforcement Learning from Denoising Feedback

arXiv CS Monday 08 June 2026, 04:00 UTC By Qi He, Huan Chen, Ya Guo, Huijia Zhu, Yi R. Fung, Baojian Zhou 1 min read

Key Points

arXiv:2605.25638v2 Announce Type: replace Abstract: Policy loss estimation remains a fundamental and long-standing challenge in reinforcement learning (RL) for diffusion language models (DLMs). We introduce Reinforcement Learning from Denoising Feedback (RLDF), a novel training paradigm that leverages feedback obtained from rollout and training processes to facilitate accurate and efficient policy loss estimation. To balance the trade-off between computational efficiency and estimation effectiveness, RLDF optimizes the model toward the clipped clean state from intermediate noisy states, combined with weighted timestep sampling over denoising timesteps. Extensive experiments demonstrate that RLDF achieves consistent and substantial improvements in both performance and generalizability across two representative DLM architectures, LLaDA and Dream, on multiple reasoning benchmarks. Our work lays a principled foundation for scalable reinforcement learning in diffusion language models. We build Drift, a training framework for DLMs, available at https://github.com/ant-research/Drift.

Reinforcement Learning (ORG) RL (ORG) RLDF (ORG) DLM (ORG)

Originally published by arXiv CS Read original →

When NASA’s Artemis III mission launches next year, the crew won’t include any women — a revelation that sparked controversy after the agency on Tuesday announced the four astronauts selected for the flight. “Not a single woman flying on Artemis III is an insane choice,” Alexandra Doten, a space influencer who goes by Astro Alexandra, posted on X on Tuesday. NASA Administrator Jared Isaacman attempted to address these criticisms head-on Wednesday.

NBC News 20m ago

Jeffery Lee breathes ‘sigh of relief’ after Alabama’s nitrogen execution deemed unconstitutional

A death row prisoner whose planned execution on Thursday was suddenly halted became emotional when he learned that a federal court had ruled Alabama’s use of nitrogen gas violates the constitutional ban on cruel and unusual punishment. “It’s like an expected sigh of relief in one aspect, and then you still got to stay and maintain your focus and continue to fight,” Jeffery Lee, who has been on death row for nearly three decades, told NBC News by phone Tuesday. He spoke from the William C....

NBC News 36m ago

Nearly Everyone, Everywhere, Veers Left When Walking

Researchers are at a loss for why people across cultures and ages, regardless of their dominant hand, have a natural bias toward wandering in a counterclockwise direction.

NYT Science 45m ago

Reinforcement Learning from Denoising Feedback

Related Stories

Japan’s Retail Investor Army Flocks to SpaceX After IPO Drought

NASA addresses criticism over all-male crew selected for Artemis III test mission

Jeffery Lee breathes ‘sigh of relief’ after Alabama’s nitrogen execution deemed unconstitutional

Nearly Everyone, Everywhere, Veers Left When Walking