MulFeRL
No mentions found
This entity hasn't been tracked yet, or Iris is still building its knowledge base.
Related Articles from SNS
MulFeRL: Enhancing Reinforcement Learning with Verbal Feedback in a Multi-turn Loop
arXiv:2601.22900v2 Announce Type: replace Abstract: Reinforcement Learning with Verifiable Rewards (RLVR) is widely used to improve reasoning across domains, but outcome-only scalar rewards are often sparse and uninformative. This limitation is especially severe for failed samples, where scalar rewards indicate only that a solution is incorrect without explaining why the reasoning breaks down. In this paper, we leverage richer verbal feedback to guide RLVR on failed samples and convert...