Home Knowledge Base PAEC

PAEC

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

PAEC: Position-Aware Entropy Calibration for LLM Reasoning in RLVR

arXiv:2606.08543v1 Announce Type: new Abstract: Reinforcement learning with verifiable rewards (RLVR) improves large language model reasoning but often suffers from rapid policy-entropy collapse, where the policy prematurely concentrates on narrow high-probability reasoning paths. While global entropy regularization can encourage exploration, uniformly increasing entropy across all token positions is inefficient for long reasoning trajectories, where many tokens are not decision-relevant. We...

arXiv CS 1d ago