Latent Spherical Flow Policy for Reinforcement Learning with Combinatorial Actions

arXiv CS Tuesday 09 June 2026, 04:00 UTC By Lingkai Kong, Anagha Satish, Hezi Jiang, Akseli Kangaslahti, Andrew Ma, Wenbo Chen, Mingxiao Song, Lily Xu, Milind Tambe 1 min read

Key Points

arXiv:2601.22211v2 Announce Type: replace Abstract: Reinforcement learning (RL) with combinatorial action spaces remains challenging because feasible action sets are exponentially large and governed by complex feasibility constraints, making direct policy parameterization impractical. Existing approaches embed task-specific value functions into constrained optimization programs or learn deterministic structured policies, sacrificing generality and policy expressiveness. We propose a solver-induced \emph{latent spherical flow policy} that brings the expressiveness of modern generative policies to combinatorial RL while guaranteeing feasibility by design. Our method, LSFlow, learns a \emph{stochastic} policy in a compact continuous latent space via spherical flow matching, and delegates feasibility to a combinatorial optimization solver that maps each latent sample to a valid structured action. To improve efficiency, we train the value network directly in the latent space, avoiding repeated solver calls during policy optimization. To address the piecewise-constant and discontinuous value landscape induced by solver-based action selection, we introduce a smoothed Bellman operator that yields stable, well-defined learning targets. Empirically, our approach outperforms state-of-the-art baselines by an average of 20.6\% across a range of challenging combinatorial RL tasks.

RL (ORG) LSFlow (ORG)

Originally published by arXiv CS Read original →

Oracle awarded US government contract to provide government-wide HR software Source: Reuters Subscribe to our Chief Editor’s Week in Review Our chief editor shares analysis and picks of the week's biggest news every Saturday. Get our pick of top stories and thought-provoking articles in your inbox Subscribe hereStay updated with notifications for breaking news and our best stories Download hereGet WhatsApp alerts Join our channel for the top reads for the day on your preferred chat app Join...

Channel News Asia 20m ago

Karmelo Anthony verdict draws anti-white rage and lies from radical Dem congresswoman, angry activists

A Texas congresswoman is leading the voice of online activists enraged over the guilty verdict in Karmelo Anthony's murder trial, and is spreading outright lies and racially inflammatory rhetoric after the 19-year-old was sentenced to 35 years in prison for stabbing Austin Metcalf to death. Rep. Jasmine Crockett, a rare radical Democrat elected in deep red Texas, took to her podcast after Tuesday's verdict to make false claims about the trial and its jury as she continues to stir up racial...

Fox News Politics 23m ago

Angela Rayner demands visa rules shake-up for care workers 'living in fear'

Angela Rayner demands visa rules shake-up for care workers 'living in fear' The former Deputy Prime Minister, a former carer, said migrant staff were trapped in a system that leaves them at the mercy of an employer over their right to remain in the UK Angela Rayner has piled fresh pressure on the Government to shake-up visa rules that leave care workers living in fear. The former Deputy Prime Minister, a former carer, said migrant staff were trapped in a system that leaves them at the mercy...

Daily Mirror 41m ago

Purple Heart recipient mocked by Platner says PTSD does not excuse 'abhorrent behavior'

A Purple Heart recipient wounded in Afghanistan is speaking out after Reddit comments linked to Democratic Maine Senate candidate Graham Platner resurfaced, saying PTSD does not excuse mocking a wounded American service member. Speaking on "The Ingraham Angle," Pfc. Ted Daniels, who received a Purple Heart after surviving a Taliban attack, pushed back against efforts to explain Platner's comments by citing PTSD."Right now it appears that Graham Platner is the poster child for people who...

Fox News 51m ago

Latent Spherical Flow Policy for Reinforcement Learning with Combinatorial Actions

Related Stories

Oracle awarded US government contract to provide government-wide HR software

Karmelo Anthony verdict draws anti-white rage and lies from radical Dem congresswoman, angry activists

Angela Rayner demands visa rules shake-up for care workers 'living in fear'

Purple Heart recipient mocked by Platner says PTSD does not excuse 'abhorrent behavior'