Statistically Efficient Policy Evaluation
No mentions found
This entity hasn't been tracked yet, or Iris is still building its knowledge base.
Related Articles from SNS
Trajectory Data Suffices for Statistically Efficient Policy Evaluation in Fixed-Horizon Offline RL with Linear $q^\pi$-Realizability and Concentrability
Announce Type: replace Abstract: We study finite-horizon offline reinforcement learning (RL) with function approximation for both policy evaluation and policy optimization. Prior work established that statistically efficient learning is impossible for either of these problems when the only assumptions are that the data has good coverage (concentrability) and the state-action value function of every policy is linearly realizable ($q^\pi$-realizability) (Foster et al., 2021). Recently, Tkachuk...
Nonparametric LLM Evaluation from Preference Data
arXiv:2601.21816v2 Announce Type: replace Abstract: Evaluating the performance of large language models (LLMs) from human preference data is crucial for obtaining LLM leaderboards. However, many existing approaches either rely on restrictive parametric assumptions or lack valid uncertainty quantification when flexible machine learning methods are used.
ImmigrationQA: A Source-Grounded Dataset and Small-Model Adaptation for U.S. Immigration Law
Announce Type: new Abstract: U.S. immigration law spans thousands of pages of official policy, federal regulations, and procedural guidance that change frequently and carry high stakes for petitioners who lack legal representation. We describe the construction of ImmigrationQA, a source-grounded question-answering dataset of 17,058 pairs across 13 immigration subdomains, and the fine-tuning of a Llama 3.2 3B Instruct model on that dataset using parameter-efficient LoRA. The corpus was...
Efficient Exploration for Iterative Nash Preference Optimization
arXiv:2606.01382v1 Announce Type: new Abstract: Preference alignment is central to improving large language models, but standard reward-based formulations can be restrictive when human preferences are cyclic, non-transitive, or otherwise not representable by a scalar reward. Nash Learning from Human Feedback (NLHF) addresses this limitation by modeling alignment as a preference game and targeting a Nash equilibrium rather than a reward maximizer. However, the learning-theoretic foundations...
Policy on the AI Exponential
Policy on the AI Exponential In one of the side plots to The Lord of the Rings, two of the Hobbits attempt to rouse Treebeard—a wise but ponderous sentient tree—to defend his forest from an army that is cutting it down. The problem is that Treebeard operates at a very different speed than the Hobbits. It takes him a full day simply to say hello to another tree, so getting him and his peers to act fast enough is nearly impossible.
UP’s green renaissance: A future where growth & nature will thrive
Today, the world is facing the unprecedented challenge of climate change. Rising global temperatures, erratic monsoons, drying rivers, declining groundwater resources, air pollution, and the loss of biodiversity have emerged as serious threats to human life, economic prosperity, and social stability. Floods, droughts, heatwaves, and other extreme weather events are becoming increasingly frequent across different parts of the world.
CS336: Language Modeling from Scratch
Course Staff Logistics - Lectures: Monday/Wednesday 3:00-4:20pm in Skilling Auditorium - Recordings: YouTube playlist - Office hours: - Percy Liang: Fridays 11am-12pm in Gates 366 - Tatsu Hashimoto: Tuesdays 11-12am in Gates 364 - Marcel Rød: Tuesdays 4:30-5:30pm in Gates 498, Wednesdays 4:30-5:30pm in Gates 415 - Herman Brunborg: Wednesdays 1:30-2:30pm, Fridays 1:30-2:30pm, location Gates 392 - Steven Cao: Mondays 4:30-5:30pm, Thursdays 9:30-10:30am, Gates 200 - Contact: Students should ask...
As Indonesia tightens spending, Prabowo's travel-heavy diplomacy comes under scrutiny
analysis Asia As Indonesia tightens spending, Prabowo's travel-heavy diplomacy comes under scrutiny Critics question whether Indonesia President Prabowo Subianto’s overseas trips are producing measurable economic and political returns at a time when the administration is championing spending cuts and budget efficiency at home. SINGAPORE/JAKARTA: Indonesian President Prabowo Subianto's latest and fourth trip to France in late May has reignited public scrutiny over his travels, particularly...