SA-AH-GRPO

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

Selective-Advantage Entropy-Adaptive Horizon GRPO: Asymmetric Token-Level Discounting for Efficient Reinforcement Learning of Language Models

Announce Type: new Abstract: Group Relative Policy Optimisation (GRPO) has emerged as an effective reinforcement-learning algorithm for aligning language models on reasoning tasks, but it treats every token position and every sampled rollout symmetrically. We introduce two complementary extensions: (i) Adaptive-Horizon GRPO (AH-GRPO), which weights each token's policy gradient using a cumulative entropy-based discount that reduces the effective horizon when the model is uncertain, and (ii)...

arXiv CS 5d ago

Sovereign News Station

Self-hosted. No tracking. No ads. Independent news intelligence powered by sovereign infrastructure.

Daily briefing to your inbox:

Subscribed. Welcome aboard.

Home Live Analysis Trending Analytics Operations RSS Feed About

Sovereign News Station — Independent news intelligence · Self-hosted · No tracking