Home Knowledge Base GCPO

GCPO

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

Guidance Contrastive Token Credit Assignment for Discrete Policy Optimization

arXiv:2605.29198v2 Announce Type: replace Abstract: Group-advantage-based reinforcement learning methods, such as GRPO and DAPO, have demonstrated strong performance across diverse domains, including mathematical reasoning and text-to-image generation. However, their reliance on sample-level rewards introduces a key limitation as uniform credit assignment across all tokens fails to capture fine-grained, token-level contributions. To address this issue, we propose Guidance Contrastive Policy...

arXiv CS 9d ago