Home Knowledge Base TraceLift

TraceLift

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

Correct Is Not Enough: Training Reasoning Planners with Executor-Grounded Rewards

arXiv:2605.03862v4 Announce Type: replace Abstract: Reinforcement learning with verifiable rewards has become a common way to improve explicit reasoning in large language models, but final-answer correctness alone does not reveal whether the reasoning trace is faithful, reliable, or useful to the model that consumes it. This outcome-only signal can reinforce traces that are right for the wrong reasons, overstate reasoning gains by rewarding shortcuts, and propagate flawed intermediate states...

arXiv CS 1d ago