d2-AnyOrder
No mentions found
This entity hasn't been tracked yet, or Iris is still building its knowledge base.
Related Articles from SNS
d2: Improving Reasoning in Diffusion Language Models via Trajectory Likelihood Estimation
arXiv:2509.21474v4 Announce Type: replace Abstract: While diffusion language models (DLMs) have achieved competitive performance in text generation, improving their reasoning ability with reinforcement learning remains an active research area. Here, we introduce d2, a reasoning framework tailored for masked DLMs. Central to our framework is a new policy gradient algorithm that relies on accurate estimates of the sampling trajectory likelihoods.