Home Knowledge Base AxC

AxC

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

A Pre-Registered Causal Partition of Self-Consistency Elicitation and Reward Design in RLVR

Announce Type: new Abstract: Reinforcement learning from verifiable rewards (RLVR) improves reasoning even when the reward signal is spurious -- assigning credit to the group-plurality answer rather than a ground-truth verifier. Practitioners commonly interpret naive = acc(TRUE) - acc(RANDOM) as the reward-design effect. We prove this estimand is systematically biased: it conflates self-consistency elicitation (sharpening the policy toward its modal answer via majority pseudo-reward) with...

arXiv CS 5d ago

A Pre-Registered Causal Partition of Self-Consistency Elicitation and Reward Design in RLVR

arXiv:2606.05932v2 Announce Type: replace Abstract: Reinforcement learning from verifiable rewards (RLVR) improves reasoning even when the reward signal is spurious -- assigning credit to the group-plurality answer rather than a ground-truth verifier. Practitioners commonly interpret naive = acc(TRUE) - acc(RANDOM) as the reward-design effect. We prove this estimand is systematically biased: it conflates self-consistency elicitation (sharpening the policy toward its modal answer via majority...

arXiv CS 1d ago