RLHF May Not Reflect Genuine Preferences

arXiv CS Tuesday 02 June 2026, 04:00 UTC By Bijean Ghafouri, Eun Cheol Choi, Priyanka Dey, Emilio Ferrara 1 min read

Key Points

arXiv:2604.03238v2 Announce Type: replace Abstract: Reinforcement Learning from Human Feedback (RLHF) assumes that annotation responses reflect genuine human preferences. They often do not. Behavioral scientists have documented for sixty years that people produce responses without holding genuine opinions, construct preferences on the spot from contextual cues, and interpret identical questions differently. Importantly, these failures are common for the judgments on values that matter most for AI alignment. We argue that measurement validity is logically prior to preference aggregation. Before asking how to combine annotations, the field must ask whether the responses being combined are preferences at all. We organize annotation responses along a spectrum, from non-attitudes (no signal) to genuine preferences (full signal), and develop diagnostics that locate responses on this spectrum. In two RLHF datasets, we show that inconsistency is systematic and directionally biased. Filtering high-inconsistency annotators flips majority harm classifications for 18.6% of prompts and shifts mean ratings by over 13 points on a 100-point scale. As such, much of the current RLHF practice models noise as signal and elicitation artifacts as human values.

AI (ORG) RLHF (ORG)

Originally published by arXiv CS Read original →

Prof Kathy Willis responds to research showing that the poorest areas in the country face the deepest cuts to green spacesThe new research covered in your report (England’s poorest areas face deepest cuts to green space under planning law changes, report finds, 4 June) highlights the stark inequalities that exist across England when it comes to accessing nature-rich places and unlocking the many health, wellbeing and economic benefits that they can provide. In short, the research has found...

The Guardian UK 29m ago

The Last Evolution, by John W Campbell Jr. (1932)

The Project Gutenberg EBook of The Last Evolution, by John Wood Campbell This eBook is for the use of anyone anywhere at no cost and with almost no restrictions whatsoever. You may copy it, give it away or re-use it under the terms of the Project Gutenberg License included with this eBook or online at www.gutenberg.org

Hacker News 33m ago

Genetically modified worms can now produce and deliver drugs inside a living body, scientists say

Genetically modified worms can now produce and deliver drugs inside a living body, scientists say In a proof-of-concept lab experiment, scientists demonstrated that intestinal parasites could make and release therapeutic agents inside a living host. Scientists genetically tweaked a tiny, worm-like parasite to produce a life-saving antitoxin from inside a living host. In a first-of-its-kind study, researchers modified the hookworm Ancylostoma ceylanicum so that it produces antibodies that...

Live Science 1h ago

Indonesia Landslides Devastated Endangered Orangutans, Study Finds

More than 5 percent of the species is estimated to have been lost when a climate-fueled storm unleashed torrents of water, mud and debris.

NYT Science 1h ago

RLHF May Not Reflect Genuine Preferences

Related Stories

Link between poverty and access to nature | Letter

The Last Evolution, by John W Campbell Jr. (1932)

Genetically modified worms can now produce and deliver drugs inside a living body, scientists say

Indonesia Landslides Devastated Endangered Orangutans, Study Finds