Home › Business & Finance › Test-time reward-guided alignment of language models by...

Business & Finance

Test-time reward-guided alignment of language models by importance sampling on pre-logit space

arXiv CS Thursday 04 June 2026, 04:00 UTC By Sekitoshi Kanai, Tsukasa Yoshida, Hiroshi Takahashi, Haru Kuroki, Kazumune Hashimoto 1 min read

Key Points

arXiv:2510.26219v3 Announce Type: replace Abstract: Test-time alignment of large language models (LLMs) attracts attention because fine-tuning of LLMs requires high computational costs. In this paper, we propose a new test-time reward-guided alignment method called adaptive importance sampling on pre-logits (AISP) on the basis of the sampling-based model predictive control with the stochastic control input. AISP applies the Gaussian perturbation into pre-logits, which are outputs of the penultimate layer, so as to maximize expected rewards with respect to the mean of the perturbation. We demonstrate that the optimal mean is obtained by importance sampling with sampled rewards. AISP outperforms best-of-n sampling in terms of rewards over the number of used samples and achieves higher rewards than other reward-based test-time alignment methods.

AISP (ORG)

Originally published by arXiv CS Read original →

Oracle reported better-than-expected earnings and revenue for the fiscal fourth quarter on Wednesday while also raising its profit forecast for the year. The stock dropped 5% in extended trading as the company plans to raise more money to finance its AI buildout. Here's how the company did in comparison with LSEG consensus: - Earnings per share: $2.11 adjusted vs. $1.96 expected - Revenue: $19.18 billion vs. $19.10 billion expected Revenue increased 21% year over year in the quarter, which...

CNBC 25m ago

Test-time reward-guided alignment of language models by importance sampling on pre-logit space

Related Stories

Wall Street counts down to SpaceX IPO

Oracle’s stock slides after earnings, as the steep price of AI spooks investors

Rollins Defends Screwworm Response as USDA Staff Cuts Draw Fire

Oracle beats on earnings, but stock drops on plans to raise another $20 billion