Home Knowledge Base Nash Learning

Nash Learning

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

Efficient Exploration for Iterative Nash Preference Optimization

arXiv:2606.01382v1 Announce Type: new Abstract: Preference alignment is central to improving large language models, but standard reward-based formulations can be restrictive when human preferences are cyclic, non-transitive, or otherwise not representable by a scalar reward. Nash Learning from Human Feedback (NLHF) addresses this limitation by modeling alignment as a preference game and targeting a Nash equilibrium rather than a reward maximizer. However, the learning-theoretic foundations...

arXiv CS 8d ago

Fairness in two-player zero-sum games with bandit feedback

Announce Type: new Abstract: We study two-player zero-sum games (TPZSGs) with bandit feedback under fairness constraints requiring every action to be played with probability at least $\alpha/m$. Existing instance-dependent results target $\textit{pure}$ Nash equilibria, while fairness generically produces $\textit{mixed}$ equilibria, a harder learning target. Our key technical tool is a reparametrization: every fair strategy decomposes as $p = (\alpha/m)\mathbf{1} + (1-\alpha)\widetilde{p}$...

arXiv CS 8d ago

Population-Aware Imitation Learning in Mean-field Games with Common Noise

arXiv:2605.03357v2 Announce Type: replace Abstract: Mean Field Games (MFGs) provide a powerful framework for modeling the collective behavior of large populations of interacting agents. In this paper, we address the problem of Imitation Learning (IL) in MFGs subject to common noise, where the population distribution evolves stochastically. This stochasticity compels agents to adopt population-aware policies to respond to aggregate shocks.

arXiv CS 1d ago

A No-Regret Framework for Adaptive Incentive Design

arXiv:2606.02529v1 Announce Type: cross Abstract: Incentive design studies how a central authority can influence strategic agents through payments, subsidies, or taxes, so that individual objectives align with collective welfare. This paper introduces a No-Regret Adaptive Incentive Design (RAID) framework for nonlinear games with continuous action spaces and private agent costs. In this framework, the authority (planner) designs incentives that regulate the Nash equilibrium toward a socially...

arXiv CS 8d ago

Learning to Strategically Acquire Resources in Competition

arXiv:2606.06882v1 Announce Type: new Abstract: We consider multiple agents competing to acquire some costly divisible resource (e.g. shares of a financial asset, compute resources, etc.) Leveraging a standard model for price dynamics, we propose a novel game-theoretic model for this problem, generalizing settings studied in diverse literatures. Our analysis considers different assumptions on the information available to agents.

arXiv CS 2d ago

Mean-based algorithms: A lower bound and regret

Announce Type: new Abstract: Mean-based algorithms are a class of online learning algorithms that assign low probability to actions with low average rewards. Recent work indicates these algorithms converge favorably to serially undominated actions, which approximate Nash equilibria in economic games. However, empirical studies also show slower convergence compared to established algorithms in bandit-feedback scenarios.

arXiv CS 6d ago

Conditional Graph Diffusion for Negotiation Support: Overcoming Discrete Infeasibility and Preference Elicitation Gaps

arXiv:2606.02209v1 Announce Type: new Abstract: Traditional bilateral negotiation support systems search over discrete allocation spaces. This approach encounters structural infeasibility when no discrete outcome satisfies individual rationality. It fails to incorporate preference signals embedded in natural language dialogue.

arXiv CS 8d ago

Should Demand Models Incorporate Competitor Prices? Oblivious Learning and Algorithmic Collusion

arXiv:2606.05363v2 Announce Type: replace Abstract: On a platform with many sellers, should a pricing algorithm explicitly model competitors' prices when learning demand? Classical learning arguments suggest an affirmative answer: ignoring competitors induces model misspecification and inefficiency. In contrast, recent work on algorithmic collusion suggests that strategic obliviousness -- deliberately ignoring competitor prices -- may facilitate collusive outcomes and improve profits.

arXiv CS 1d ago

Should Demand Models Incorporate Competitor Prices? Oblivious Learning and Algorithmic Collusion

arXiv:2606.05363v1 Announce Type: new Abstract: On a platform with many sellers, should a pricing algorithm explicitly model competitors' prices when learning demand? Classical learning arguments suggest an affirmative answer: ignoring competitors induces model misspecification and inefficiency. In contrast, recent work on algorithmic collusion suggests that strategic obliviousness -- deliberately ignoring competitor prices -- may facilitate collusive outcomes and improve profits.

arXiv CS 5d ago

ExoMars rover targets vast bed of clay in search for life

ExoMars rover targets vast bed of clay in search for life Lisa Lock Scientific Editor Robert Egan Associate Editor In the region where the ExoMars Rosalind Franklin rover will search for signs of life, clay deposits extend beyond previous estimates, a new study finds. One hypothesis even suggests a vast ocean once covered the landing site. Clay minerals require liquid water to form and hold clues of a time when the red planet was wetter and more hospitable to life.

Phys.org 6d ago