Home Knowledge Base NLHF

NLHF

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

Efficient Exploration for Iterative Nash Preference Optimization

arXiv:2606.01382v1 Announce Type: new Abstract: Preference alignment is central to improving large language models, but standard reward-based formulations can be restrictive when human preferences are cyclic, non-transitive, or otherwise not representable by a scalar reward. Nash Learning from Human Feedback (NLHF) addresses this limitation by modeling alignment as a preference game and targeting a Nash equilibrium rather than a reward maximizer. However, the learning-theoretic foundations...

arXiv CS 8d ago