From Correctness to Utility: Gain-Based Prefix Evaluation for LLM Reasoning

arXiv CS Monday 08 June 2026, 04:00 UTC By Yuhang Zhou, Yixin Cao, Guangnan Ye 1 min read

Key Points

arXiv:2606.07190v1 Announce Type: new Abstract: Reasoning prefixes shape the future trajectory of LLM problem solving, yet existing process reward models usually evaluate them through local step correctness. We argue that correctness is a useful but indirect proxy for the effect we ultimately care about: whether a prefix increases the probability of successful completion. We define this effect as prefix gain, the solve-rate improvement induced by conditioning lightweight student model group on a prefix, and use it to train a Prefix Utility Model (PUM) with a simple pairwise ranking objective. PUM learns outcome-grounded prefix utility and can score both complete trajectories and partial reasoning prefixes. Across Best-of-$N$ selection, beam search, and reinforcement learning on mathematical reasoning, PUM provides a strong prefix-level supervision signal, especially when candidate pools are large, search budgets increase, or rule-based rewards are sparse. We release all data, models, and code at https://zhiqix.github.io/pum-project-page.

LLM (ORG)

Originally published by arXiv CS Read original →

From Correctness to Utility: Gain-Based Prefix Evaluation for LLM Reasoning

Related Stories

Valve will stop producing physical Steam gift cards because of scammers

Oracle Reports Higher-Than-Expected Data Center Spending

USDA's Rollins called screwworm a 'little pest' amid U.S. spread. Last year, she called it 'terrifying'

Citi Says Investors Growing More Selective on Data Center Bonds