CLPO: Curriculum Learning meets Policy Optimization for LLM Reasoning

arXiv CS Tuesday 09 June 2026, 04:00 UTC By Shijie Zhang, Zheng Xiao, Shiyu Liu, Guohao Sun, Kevin Zhang, Xiang Guo, Rujun Guo, Shaoyu Liu, Wangxiao Zhao, Guanjun Jiang 1 min read

Key Points

arXiv:2509.25004v2 Announce Type: replace Abstract: Online reinforcement learning with verifiable rewards (RLVR) has become an effective paradigm for improving the reasoning abilities of large language models, but most methods still optimize reasoning trajectories over the static problem set, wasting rollout budget on solved or overly difficult problems. We propose \textbf{CLPO (Curriculum Learning meets Policy Optimization)}, a self-evolving curriculum framework that uses on-policy rollout accuracy to identify solved, medium-difficulty, and hard problems, then restructures selected tasks according to the model's current capability. Hard problems are simplified to become learnable, while medium-difficulty problems are diversified to provide useful training variation. This allows the learning curriculum to co-evolve with the policy rather than remaining fixed as the model's capability boundary shifts. Rather than treating these rewrites as static data augmentation, CLPO optimizes restructuring trajectories with credit assigned by the downstream accuracy gain of the rewritten problem, requiring no additional human annotations beyond the original verifiable answers. Experiments across mathematical reasoning and out-of-domain general reasoning benchmarks show that CLPO substantially outperforms GRPO and DAPO on Qwen3-8B by 10.21 and 7.75 average points, respectively. Ablation studies on math and code domains further show that both the restructuring mode and the rewriting loss contribute to the final gains, demonstrating that CLPO provides a scalable and robust pathway for eliciting stronger reasoning capabilities through a self-evolving curriculum.

Curriculum Learning (ORG) CLPO (ORG) GRPO (ORG)

Originally published by arXiv CS Read original →

Oracle awarded US government contract to provide government-wide HR software Source: Reuters Subscribe to our Chief Editor’s Week in Review Our chief editor shares analysis and picks of the week's biggest news every Saturday. Get our pick of top stories and thought-provoking articles in your inbox Subscribe hereStay updated with notifications for breaking news and our best stories Download hereGet WhatsApp alerts Join our channel for the top reads for the day on your preferred chat app Join...

Channel News Asia 19m ago

Karmelo Anthony verdict draws anti-white rage and lies from radical Dem congresswoman, angry activists

A Texas congresswoman is leading the voice of online activists enraged over the guilty verdict in Karmelo Anthony's murder trial, and is spreading outright lies and racially inflammatory rhetoric after the 19-year-old was sentenced to 35 years in prison for stabbing Austin Metcalf to death. Rep. Jasmine Crockett, a rare radical Democrat elected in deep red Texas, took to her podcast after Tuesday's verdict to make false claims about the trial and its jury as she continues to stir up racial...

Fox News Politics 22m ago

Angela Rayner demands visa rules shake-up for care workers 'living in fear'

Angela Rayner demands visa rules shake-up for care workers 'living in fear' The former Deputy Prime Minister, a former carer, said migrant staff were trapped in a system that leaves them at the mercy of an employer over their right to remain in the UK Angela Rayner has piled fresh pressure on the Government to shake-up visa rules that leave care workers living in fear. The former Deputy Prime Minister, a former carer, said migrant staff were trapped in a system that leaves them at the mercy...

Daily Mirror 40m ago

Purple Heart recipient mocked by Platner says PTSD does not excuse 'abhorrent behavior'

A Purple Heart recipient wounded in Afghanistan is speaking out after Reddit comments linked to Democratic Maine Senate candidate Graham Platner resurfaced, saying PTSD does not excuse mocking a wounded American service member. Speaking on "The Ingraham Angle," Pfc. Ted Daniels, who received a Purple Heart after surviving a Taliban attack, pushed back against efforts to explain Platner's comments by citing PTSD."Right now it appears that Graham Platner is the poster child for people who...

Fox News 51m ago

CLPO: Curriculum Learning meets Policy Optimization for LLM Reasoning

Related Stories

Oracle awarded US government contract to provide government-wide HR software

Karmelo Anthony verdict draws anti-white rage and lies from radical Dem congresswoman, angry activists

Angela Rayner demands visa rules shake-up for care workers 'living in fear'

Purple Heart recipient mocked by Platner says PTSD does not excuse 'abhorrent behavior'