Home › Knowledge Base › Statistically Efficient Policy Evaluation

Statistically Efficient Policy Evaluation

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

Trajectory Data Suffices for Statistically Efficient Policy Evaluation in Fixed-Horizon Offline RL with Linear $q^\pi$-Realizability and Concentrability

Announce Type: replace Abstract: We study finite-horizon offline reinforcement learning (RL) with function approximation for both policy evaluation and policy optimization. Prior work established that statistically efficient learning is impossible for either of these problems when the only assumptions are that the data has good coverage (concentrability) and the state-action value function of every policy is linearly realizable ($q^\pi$-realizability) (Foster et al., 2021). Recently, Tkachuk...

arXiv CS 8d ago

Nonparametric LLM Evaluation from Preference Data

arXiv:2601.21816v2 Announce Type: replace Abstract: Evaluating the performance of large language models (LLMs) from human preference data is crucial for obtaining LLM leaderboards. However, many existing approaches either rely on restrictive parametric assumptions or lack valid uncertainty quantification when flexible machine learning methods are used.

arXiv CS 1d ago

ImmigrationQA: A Source-Grounded Dataset and Small-Model Adaptation for U.S. Immigration Law

Announce Type: new Abstract: U.S. immigration law spans thousands of pages of official policy, federal regulations, and procedural guidance that change frequently and carry high stakes for petitioners who lack legal representation. We describe the construction of ImmigrationQA, a source-grounded question-answering dataset of 17,058 pairs across 13 immigration subdomains, and the fine-tuning of a Llama 3.2 3B Instruct model on that dataset using parameter-efficient LoRA. The corpus was...

arXiv CS 9d ago

Efficient Exploration for Iterative Nash Preference Optimization

arXiv:2606.01382v1 Announce Type: new Abstract: Preference alignment is central to improving large language models, but standard reward-based formulations can be restrictive when human preferences are cyclic, non-transitive, or otherwise not representable by a scalar reward. Nash Learning from Human Feedback (NLHF) addresses this limitation by modeling alignment as a preference game and targeting a Nash equilibrium rather than a reward maximizer. However, the learning-theoretic foundations...

arXiv CS 8d ago

Policy on the AI Exponential

Policy on the AI Exponential In one of the side plots to The Lord of the Rings, two of the Hobbits attempt to rouse Treebeard—a wise but ponderous sentient tree—to defend his forest from an army that is cutting it down. The problem is that Treebeard operates at a very different speed than the Hobbits. It takes him a full day simply to say hello to another tree, so getting him and his peers to act fast enough is nearly impossible.

Hacker News 1h ago

UP’s green renaissance: A future where growth & nature will thrive

Today, the world is facing the unprecedented challenge of climate change. Rising global temperatures, erratic monsoons, drying rivers, declining groundwater resources, air pollution, and the loss of biodiversity have emerged as serious threats to human life, economic prosperity, and social stability. Floods, droughts, heatwaves, and other extreme weather events are becoming increasingly frequent across different parts of the world.

Times of India 6d ago

CS336: Language Modeling from Scratch

Course Staff Logistics - Lectures: Monday/Wednesday 3:00-4:20pm in Skilling Auditorium - Recordings: YouTube playlist - Office hours: - Percy Liang: Fridays 11am-12pm in Gates 366 - Tatsu Hashimoto: Tuesdays 11-12am in Gates 364 - Marcel Rød: Tuesdays 4:30-5:30pm in Gates 498, Wednesdays 4:30-5:30pm in Gates 415 - Herman Brunborg: Wednesdays 1:30-2:30pm, Fridays 1:30-2:30pm, location Gates 392 - Steven Cao: Mondays 4:30-5:30pm, Thursdays 9:30-10:30am, Gates 200 - Contact: Students should ask...

Hacker News 9d ago

As Indonesia tightens spending, Prabowo's travel-heavy diplomacy comes under scrutiny

analysis Asia As Indonesia tightens spending, Prabowo's travel-heavy diplomacy comes under scrutiny Critics question whether Indonesia President Prabowo Subianto’s overseas trips are producing measurable economic and political returns at a time when the administration is championing spending cuts and budget efficiency at home. SINGAPORE/JAKARTA: Indonesian President Prabowo Subianto's latest and fourth trip to France in late May has reignited public scrutiny over his travels, particularly...

Channel News Asia 5d ago