Cross-Epoch Adaptive Rollout Optimization
No mentions found
This entity hasn't been tracked yet, or Iris is still building its knowledge base.
Related Articles from SNS
Cross-Epoch Adaptive Rollout Optimization for RL Post-Training
arXiv:2606.05606v1 Announce Type: new Abstract: LLM post-training often relies on reinforcement learning methods that sample multiple rollouts per prompt, yet most existing approaches use a fixed rollout budget for every prompt, despite large differences in the training signal different prompts provide. In this paper, we study adaptive rollout allocation under a fixed global budget and formulate the problem as online resource allocation with prompt-level diminishing returns. Our method,...