Home Knowledge Base Semiparametric Preference Optimization: Your Language Model

Semiparametric Preference Optimization: Your Language Model

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

Semiparametric Preference Optimization: Your Language Model is Secretly a Single-Index Model

arXiv:2512.21917v3 Announce Type: replace Abstract: Policy alignment to preference data typically assumes a known link function between observed preferences and latent rewards (e.g., Bradley-Terry model / logistic link). Misspecification of this link can bias inferred rewards and misalign learned policies. We study policy alignment under an unknown and unrestricted link function.

arXiv CS 6d ago