CoT-Space: A Theoretical Framework for Internal Slow-Thinking via Reinforcement Learning

arXiv CS Friday 05 June 2026, 04:00 UTC By Zeyu Gan, Hao Yi, Yong Liu 1 min read

Key Points

arXiv:2509.04027v3 Announce Type: replace Abstract: Test-time scaling, primarily manifested through multi-step Chain-of-Thought (CoT) reasoning via Reinforcement Learning (RL), has emerged as a pivotal paradigm for enhancing the reasoning capabilities of Large Language Models (LLMs). However, a significant theoretical gap persists: traditional token-level analysis fails to capture the macroscopic dynamics of reasoning-level scaling. To address this, we introduce CoT-Space, a novel theoretical framework that recasts the reasoning process from a discrete token-prediction task to an optimization process within a continuous, reasoning-level semantic space. By modeling the reasoning trajectory from both noise and risk perspectives and revitalizing foundational principles from classical learning theory, we demonstrate that the observed convergence to an optimal CoT length is a natural consequence of the fundamental trade-off between underfitting and overfitting. We further utilize RL as a tool to elicit and verify these results in our experiments. Our findings provide a mechanistic explanation for the internal test-time scaling via RL, offering a principled theoretical foundation to optimize reasoning trajectories in modern LLMs.

Reinforcement Learning arXiv:2509.04027v3 (ORG) Reinforcement Learning (ORG) CoT-Space (ORG) RL (ORG)

Originally published by arXiv CS Read original →

As Elon Musk's SpaceX goes public, Australian government officials are flagging Starlink's risks Thu 11 Jun 2026 at 5:39am In short: About 200,000 Australians and several government agencies use Starlink, and major telcos are now partnering with SpaceX to expand satellite phone coverage. Federal government officials are privately flagging risks from relying on a foreign-owned provider, according to documents obtained by a freedom of information request.

ABC Australia 19m ago

Residents say Brisbane's new outer city estates missing crucial service

Residents of new outer-city developments make plea for better public transport in south-east Queensland Thu 11 Jun 2026 at 5:38am Hundreds of thousands of new homes are currently being built in priority development areas across south-east Queensland, but residents say there is one crucial thing missing in these outer suburbs: adequate public transport. Disability pensioner Maria Feige lives on the outskirts of Logan at Flagstone, soon to be home to 50,000 new dwellings and 138,000 people....

ABC Australia 20m ago

SpaceX Price Tag is 'Very Steep': Renaissance's Kennedy

Bloomberg Markets 25m ago

World's biggest whale graveyard found in Indian Ocean off Australia

World's biggest whale graveyard found in Indian Ocean off Australia Thu 11 Jun 2026 at 5:30am In short: The world's biggest whale graveyard found to date has been discovered in the Indian Ocean in international waters off the coast of Australia. Five whales actively decomposing and 476 cetacean fossils, including a new extinct species dating back five million years, were documented.

ABC Australia 28m ago

CoT-Space: A Theoretical Framework for Internal Slow-Thinking via Reinforcement Learning

Related Stories

SpaceX courts Australian investors as government warns Elon Musk risk

Residents say Brisbane's new outer city estates missing crucial service

SpaceX Price Tag is 'Very Steep': Renaissance's Kennedy

World's biggest whale graveyard found in Indian Ocean off Australia