Home Knowledge Base S2L-PO

S2L-PO

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

Smaller Models are Natural Explorers for Policy-Level Diversity in GRPO

arXiv:2605.30789v2 Announce Type: replace Abstract: We identify a new dimension for enhancing rollout diversity in Group Relative Policy Optimization (GRPO) for LLMs. While GRPO relies on diverse rollouts, prevailing strategies primarily increase diversity by injecting more token-level randomness, which may introduce step-wise noise and lead to incoherent trajectories. We uncover that smaller models within the same model family inherently exhibit higher policy-level diversity, indicated by...

arXiv CS 7d ago

Smaller Models are Natural Explorers for Policy-Level Diversity in GRPO

arXiv:2605.30789v1 Announce Type: new Abstract: We identify a new dimension for enhancing rollout diversity in Group Relative Policy Optimization (GRPO) for LLMs. While GRPO relies on diverse rollouts, prevailing strategies primarily increase diversity by injecting more token-level randomness, which may introduce step-wise noise and lead to incoherent trajectories. We uncover that smaller models within the same model family inherently exhibit higher policy-level diversity, indicated by their...

arXiv CS 9d ago