OrderGrad: Optimizing Beyond the Mean with Order-Statistic Policy Gradient Estimation

arXiv CS Friday 05 June 2026, 04:00 UTC By Paavo Parmas, Yongmin Kim, Kohsei Matsutani, Shota Takashiro, Soichiro Nishimori, Takeshi Kojima, Yusuke Iwasawa, Yutaka Matsuo 1 min read

Key Points

arXiv:2606.06096v1 Announce Type: new Abstract: Policy-gradient methods usually optimize expected return, but many real world applications care about distributional properties of returns: tail risk, outlier robustness, or best-of-K discovery. We introduce OrderGrad, a family of likelihood-ratio and reparameterization gradient estimators for order-statistic objectives. OrderGrad optimizes finite-sample L-statistics, i.e., weighted averages of sorted rewards or costs, recovering objectives such as VaR, CVaR, trimmed means, medians, and top-m/best-of-K criteria by changing only the rank weights. For any fixed sample size and rank-weight vector, OrderGrad provides an unbiased gradient estimator for the corresponding order-statistic objective. The method is implemented as a simple reward transformation that can then be used in an otherwise standard policy-gradient or reparameterized update. We study the resulting estimator's variance behavior and evaluate it on tasks where mean optimization is mismatched to the deployment objective, including LLM math post-training and other tasks. OrderGrad provides a unified, plug-and-play route to risk-averse, robust, and exploratory learning. Code: https://github.com/paavo5/ordergrad

OrderGrad: Optimizing Beyond (ORG) OrderGrad (ORG) VaR (ORG) LLM (ORG)

Originally published by arXiv CS Read original →

A sweeping warrantless surveillance authority remains on track to expire Friday, with no clear path to a deal, after President Donald Trump refused this week to abandon his pick of housing official Bill Pulte to temporarily lead the US intelligence community—even tasking Pulte with gutting the Office of the Director of National Intelligence in a DOGE-style “downsizing“ before a permanent director is named. In a Truth Social post after his second White House meeting in two days with House...

Wired 3m ago

Veterans and relatives see no place for Trump's arch near Arlington National Cemetery

Three Vietnam War veterans are suing to stop President Trump from building an arch just steps from Arlington National Cemetery, where 400,000 service members, veterans and their relatives are buried.(Image credit: Eric Lee for NPR)

NPR News 6m ago

California's 'leisurely' ballot counting faces backlash, Dems ripped for 'defending the indefensible'

California's "leisurely" ballot counting process is facing backlash from The New York Times editorial board, which ripped Democrats for defending the "indefensible" in a piece published Wednesday. "This slowness is a failure of governance, and it should help inspire the creation of a better system," the editorial board wrote. "There is no good reason that California takes so long to count votes.

Fox News 9m ago

More child health nurse visits for Victorian kids amid NDIS shake-up

Extra maternal and child health nurse visits for children in Victoria under Thriving Kids program Thu 11 Jun 2026 at 6:14am All Victorian children will get two extra visits with maternal and child health nurses as the state prepares to launch its Thriving Kids program for those to be shifted off the National Disability Insurance Scheme (NDIS). Minister for Children Lizzie Blandthorn said the state would also review the existing 10 visits available for children from when they are born to the...

ABC Australia 18m ago

OrderGrad: Optimizing Beyond the Mean with Order-Statistic Policy Gradient Estimation

Related Stories

Trump Risks Key Surveillance Authority Over ‘Unqualified’ Spy-Chief Pick

Veterans and relatives see no place for Trump's arch near Arlington National Cemetery

California's 'leisurely' ballot counting faces backlash, Dems ripped for 'defending the indefensible'

More child health nurse visits for Victorian kids amid NDIS shake-up