Boundary-Guided Policy Optimization for Memory
No mentions found
This entity hasn't been tracked yet, or Iris is still building its knowledge base.
Related Articles from SNS
Boundary-Guided Policy Optimization for Memory-efficient RL of Diffusion Large Language Models
arXiv:2510.11683v3 Announce Type: replace Abstract: A key challenge in applying reinforcement learning (RL) to diffusion large language models (dLLMs) is the intractability of their likelihood functions, which are essential for the RL objective, necessitating corresponding approximation during training. While existing methods approximate the log-likelihoods by their evidence lower bounds (ELBOs) via customized Monte Carlo (MC) sampling, they incur significant memory overhead due to the need...