Home Knowledge Base Subspace-Adaptive

Subspace-Adaptive

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

SALT: When More Rollouts Don't Help in Group-Based Policy Optimization and How to Make Them Matter

Announce Type: new Abstract: Reinforcement learning with verifiable rewards (RLVR) often adopts GRPO-style group-relative updates, sampling multiple rollouts per prompt to construct normalized learning signals. However, merely increasing the number of rollouts does not reliably strengthen learning: under GRPO-style group normalization, per-rollout policy-gradient features can concentrate into a low-rank, signed geometry, causing substantial cancellation during aggregation and weakening the...

arXiv CS 5d ago