Escaping the Verifier: Learning to Reason via Demonstrations

arXiv CS Friday 05 June 2026, 04:00 UTC By Locke Cai, Max Ryabinin, Ivan Provilkov 1 min read

Key Points

arXiv:2511.21667v4 Announce Type: replace Abstract: Training Large Language Models (LLMs) to reason often relies on Reinforcement Learning (RL) with task-specific verifiers. However, many real-world reasoning-intensive tasks lack verifiers, despite offering abundant expert demonstrations that remain under-utilized for reasoning-focused training. We introduce RARO (Relativistic Adversarial Reasoning Optimization), which learns strong reasoning capabilities from expert demonstrations alone via Inverse Reinforcement Learning. RARO sets up an adversarial game between a policy and a relativistic critic: the policy learns to mimic expert answers, while the critic aims to identify the experts among expert-policy answer pairs. Both the policy and the critic are trained jointly and continuously via RL, and we identify the key stabilization techniques required for robust learning. Empirically, RARO significantly outperforms strong verifier-free baselines across all evaluation tasks: +13.7% accuracy on Countdown (1.5B), +8.2% accuracy on DeepMath (7B), and +19.1% win-rate on Poetry Writing (7B) against expert poems. RARO also exhibits similar robust scaling trends as RL with verifiers. These results demonstrate that RARO effectively elicits strong reasoning performance from expert demonstrations alone, enabling robust reasoning learning even when task-specific verifiers are unavailable.

Reinforcement Learning (ORG) Relativistic Adversarial Reasoning Optimization (ORG) Inverse Reinforcement Learning (ORG) RL (ORG)

Originally published by arXiv CS Read original →

Escaping the Verifier: Learning to Reason via Demonstrations

Related Stories

Sources: NHLPA eyes Babcock inquiry on '23 case

Trump signs $70 billion immigration funding bill after months of delay

Emergency action seeks to prevent erasure of 'mother' and 'father' in code of largest US town

Trump takes ICE shutdowns off the table with signature on key $70B bill