AdaGRPO

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

AdaGRPO: A Capability-Aware Adaptive Enhancement for Flow-based GRPO

Announce Type: new Abstract: Group Relative Policy Optimization (GRPO) has demonstrated remarkable success in aligning text-to-image (T2I) flow models with human preferences. However, we have identified that the learning loop of current flow-based GRPO is fundamentally decoupled from the learner's current capability, suffering from critical blind spots at both prompt selection and advantage estimation: (i) Existing methods sample prompts randomly, overlooking the substantial impact of data...

arXiv CS 2d ago

Adaptive Loss Balancing for Noise-Robust GRPO in Generative Recommendation

arXiv:2606.08480v1 Announce Type: new Abstract: Reinforcement learning (RL) presents a promising avenue for enhancing generative recommendation beyond supervised imitation, leveraging reward signals to guide policy improvement. However, its efficacy is critically contingent on the trustworthiness of the reward model for the samples it evaluates. In practice, production rankers, the widely adopted reward models, are trained on exposure-biased logs, leading to sample-dependent inaccuracies...

arXiv CS 1d ago

Sovereign News Station

Self-hosted. No tracking. No ads. Independent news intelligence powered by sovereign infrastructure.

Daily briefing to your inbox:

Subscribed. Welcome aboard.

Home Live Analysis Trending Analytics Operations RSS Feed About

Sovereign News Station — Independent news intelligence · Self-hosted · No tracking