PAFO: Pareto Fairness Optimization for Personalized Reward Modeling

arXiv CS Tuesday 09 June 2026, 04:00 UTC By Xiaoyan Zhao, Haoting Ni, Yang Zhang, Chunyuan Zheng, Haoxuan Li, Fuli Feng 1 min read

Key Points

arXiv:2606.07988v1 Announce Type: new Abstract: Large language models (LLMs) increasingly rely on reward models to align their outputs with diverse user preferences. While personalized reward models aim to capture such heterogeneity, they are often trained on imbalanced user preference data and may therefore favor users whose preferences are more common in the training population. In this paper, we identify this failure mode as personalized reward bias, where reward modeling quality varies systematically with preference support rate. We formulate its mitigation as a Pareto fairness problem over group utilities, aiming to improve under-served users without degrading other user groups. To this end, we propose PAFO, a Pareto fairness optimization framework for personalized reward modeling. PAFO first trains group-specialized reward models for majority and minority preference groups, then constructs conditional margin-level supervision to distill their heterogeneous preference boundaries into a single unified model. The resulting model uses group information only during training and requires no explicit group labels at inference time. Experiments on Personal-LLM and DSP show that PAFO improves both minority-group and majority-group accuracy while reducing user-level unfairness across multiple metrics, demonstrating its effectiveness for fairer LLM personalization.

PAFO (ORG) Pareto Fairness Optimization (ORG) DSP (ORG) LLM (ORG)

Originally published by arXiv CS Read original →

'Don’t give parents more to do to keep kids safe online - they need help, not homework' "Parents have said they need more support with online safety, but a ban for under 16s plus plans to issue guidance might not be the help we need" Parents who said they want more help keeping their kids safe online might regret asking what they wished for. Because it sounds like we are about to get a whole lot more homework without any of the real support families and young people need. In an interview...

Daily Mirror 25m ago

Pollinators in peril: scientists reveal the hidden human health costs of the world’s disappearing bees

Crops and flowers rely on them for survival, but wild bees are declining – and crucial nutrients will go missing from our diets as a resultThere are few ways in and out of Nepal’s Jumla district. The Karnali highway, considered one of the world’s most dangerous roads, provides the only land link, splicing through the Himalayas to connect Jumla’s terraced valleys to the rest of the country. As such, the 120,000 people that live there are almost entirely self-sufficient, with most of them...

The Guardian UK 29m ago

Robots are about to overtake armed soldiers as the deciders of war

There’s a received piece of wisdom among militaries around the world that whatever new technologies appear, in the end, foot soldiers are what matters. As British Army officer Field Marshal Archibald Wavell put it shortly after the second world war: “All battles and all wars are won in the end by the infantryman.” This may now finally be changing.

New Scientist 2h ago

PAFO: Pareto Fairness Optimization for Personalized Reward Modeling

Related Stories

The SpaceX IPO could lead to 8% of America’s current-account deficit being refinanced in a single day

'Don’t give parents more to do to keep kids safe online - they need help, not homework'

Pollinators in peril: scientists reveal the hidden human health costs of the world’s disappearing bees

Robots are about to overtake armed soldiers as the deciders of war