CAPF: Guiding Search-Agent Rollouts with Credit-Attenuated Privileged Feedback

arXiv CS Tuesday 02 June 2026, 04:00 UTC By Bin Chen, Xinye Liao, Yiming Liu, Xin Liao, Chonghan Liu 1 min read

Key Points

arXiv:2606.01830v1 Announce Type: new Abstract: Recent LLM search agents use reinforcement learning with verifiable rewards (RLVR) to learn search-augmented reasoning from outcome rewards. On hard problems, these agents rarely sample end-to-end successful rollouts, leaving outcome-only RLVR with few positive-reward trajectories. We argue that improving learning on such problems requires additional guidance during training, and RLVR already contains verifier-side information that can provide it. This information can identify errors or omissions in the agent's submitted answer and guide revision within the rollout. We propose a training-time mechanism called \textbf{Credit-Attenuated Privileged Feedback} (CAPF), which makes this verifier-side information available through a Privileged Feedback call during training. CAPF lets the policy revise zero-reward attempts into positive-reward repair trajectories and attenuates credit for the feedback call and earlier actions to accommodate deployment without this call. Empirical research demonstrates that CAPF improves Qwen3-4B's average exact-match score from 44.7% under outcome-only RLVR to 48.5% on seven open-domain QA benchmarks.

Credit-Attenuated Privileged Feedback (ORG)

Originally published by arXiv CS Read original →

A death row prisoner whose planned execution on Thursday was suddenly halted became emotional when he learned that a federal court had ruled Alabama’s use of nitrogen gas violates the constitutional ban on cruel and unusual punishment. “It’s like an expected sigh of relief in one aspect, and then you still got to stay and maintain your focus and continue to fight,” Jeffery Lee, who has been on death row for nearly three decades, told NBC News by phone Tuesday. He spoke from the William C....

NBC News 20m ago

Nearly Everyone, Everywhere, Veers Left When Walking

Researchers are at a loss for why people across cultures and ages, regardless of their dominant hand, have a natural bias toward wandering in a counterclockwise direction.

NYT Science 28m ago

Popular UK seaside town hotel plunges into administration as holidaymakers updated

Popular UK seaside town hotel plunges into administration as holidaymakers updated This popular hotel has entered administration after closing for refurbishment in 2022 A long-shuttered seaside hotel in south Devon, which had been expected to welcome guests again following a major refurbishment, has reportedly gone into administration. According to a notice published by The Gazette, the UK's official public record, administrators were appointed on June 5.

Daily Mirror 40m ago

Scientists were excited about a blood test for many cancers — but it failed a big trial. Here's what to know.

Scientists were excited about a blood test for many cancers — but it failed a big trial. Emerging tests promise to screen for many cancers at once, but one just failed in a big trial. Will these diagnostics deliver on their promise someday?

Live Science 56m ago

CAPF: Guiding Search-Agent Rollouts with Credit-Attenuated Privileged Feedback

Related Stories

Jeffery Lee breathes ‘sigh of relief’ after Alabama’s nitrogen execution deemed unconstitutional

Nearly Everyone, Everywhere, Veers Left When Walking

Popular UK seaside town hotel plunges into administration as holidaymakers updated

Scientists were excited about a blood test for many cancers — but it failed a big trial. Here's what to know.