Home Knowledge Base MMR-GRPO

MMR-GRPO

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

MMR-GRPO: Accelerating GRPO-Style Training through Diversity-Aware Reward Reweighting

arXiv:2601.09085v2 Announce Type: replace Abstract: Group Relative Policy Optimization (GRPO) has become a standard approach for training mathematical reasoning models; however, its reliance on multiple completions per prompt makes training computationally expensive. Although recent work has reduced the number of training steps required to reach peak performance, the overall wall-clock training time often remains unchanged or even increases due to higher per-step cost. We propose MMR-GRPO,...

arXiv CS 1d ago