Distilling LLM Feedback for Lean Theorem Proving

arXiv CS Monday 01 June 2026, 04:00 UTC By Gaetan Narozniak, G\'erard Biau, R\'emi Munos, Ahmad Rammal, Pierre Marion 1 min read

Key Points

Announce Type: new Abstract: Post-training for reasoning models typically combines supervised fine-tuning with reinforcement learning from verifiable rewards, most commonly with GRPO. However, this algorithm suffers from sparse rewards, limited exploration, and mode collapse. Building upon recent works on self-distillation, we propose Feedback Distillation, a training method where the model is trained to match, at the token level, its own distribution conditioned on privileged feedback...

arXiv:2605.30861v1 Announce Type: new Abstract: Post-training for reasoning models typically combines supervised fine-tuning with reinforcement learning from verifiable rewards, most commonly with GRPO. However, this algorithm suffers from sparse rewards, limited exploration, and mode collapse. Building upon recent works on self-distillation, we propose Feedback Distillation, a training method where the model is trained to match, at the token level, its own distribution conditioned on privileged feedback produced by a language model. Feedback Distillation offers token-level supervision and can inject external knowledge. Evaluating our method for Lean4 theorem-proving, we find that Feedback Distillation maintains greater diversity in generated trajectories than GRPO, yielding higher policy entropy and better pass@k scaling. The two methods are complementary: initializing GRPO from a Feedback Distillation checkpoint outperforms either method alone. All in all, our results suggest a promising avenue to improve post-training for complex reasoning.

GRPO (ORG) Feedback Distillation (ORG)

Originally published by arXiv CS Read original →

Democrat labels Trump ‘sleeping’ in public a national security risk White House has brushed off concerns about Trump’s pattern of closing his eyes for an extended period of time, claiming he is blinking - Bookmark - CommentsGo to comments A Democratic lawmaker raised concerns Wednesday that President Donald Trump’s habit of closing his eyes for an extended period of time during Cabinet meetings, Oval Office press conferences and other events is a sign he is not well enough to handle national...

The Independent World 22m ago

Farage suddenly returns to political stage – but dodges questions about £5m gift

Reform UK leader has been unusually quiet in recent weeks – at great cost to the party during a crucial byelectionFake images of Nigel Farage have been ubiquitous online lately – but the real politician has proved far more elusive since it was revealed seven weeks ago that he took a £5m personal gift from a crypto billionaire. And while an AI-generated depiction of the Reform UK leader was falsely shown getting violent on BBC’s Question Time, Farage has been largely avoiding the TV studios...

The Guardian Politics 26m ago

Anthropic urges US to require safety tests for most capable AI models

Anthropic urges US to require safety tests for most capable AI models WASHINGTON, June 10 : Anthropic called on the U.S. Congress not to block state laws regulating AI unless it enacts a "rigorous" federal law that addresses "catastrophic AI risks," according to a company statement. The company also urged Congress to require AI companies put their most powerful models through independent safety tests, according to the statement.

Channel News Asia 29m ago

Senator Wendy Askew resigns with fellow Liberals, One Nation eyeing seat

Tasmanian Senator Wendy Askew resigns with fellow Liberals, One Nation eyeing seat Thu 11 Jun 2026 at 5:26am In short: Senator Wendy Askew, who has another two years left in the Senate before the next election, says she will not contest the seat again. She won the second spot on the Liberals's Senate ticket in 2022 behind Jonathon Duniam, beating out the long-serving Eric Abetz who ultimately lost his seat.

ABC Australia 32m ago

Distilling LLM Feedback for Lean Theorem Proving

Related Stories

Democrat labels Trump ‘sleeping’ in public a national security risk

Farage suddenly returns to political stage – but dodges questions about £5m gift

Anthropic urges US to require safety tests for most capable AI models

Senator Wendy Askew resigns with fellow Liberals, One Nation eyeing seat