BIAS
No mentions found
This entity hasn't been tracked yet, or Iris is still building its knowledge base.
Related Articles from SNS
Reward Bias Substitution: Single-Axis Bias Mitigations Redirect Optimization Pressure
arXiv:2605.27996v2 Announce Type: replace Abstract: Single-axis mitigations of reward-model biases (e.g., reducing proxy reliance on length, sycophancy, or style) can rotate optimization pressure onto correlated proxies rather than eliminate it, a failure mode we call reward bias substitution. The failure is enabled by a measurement-versus-optimization gap between audit and policy-induced distributions during mitigation evaluation and policy training. We formalize mitigation outcomes into a...
One Bias After Another: Mechanistic Reward Shaping and Persistent Biases in Language Reward Models
arXiv:2603.03291v2 Announce Type: replace Abstract: Reward Models (RMs) are crucial for online alignment of language models (LMs) with human preferences. However, RM-based preference-tuning is vulnerable to reward hacking, whereby LM policies learn undesirable behaviors from flawed RMs. By systematically measuring biases in five high-quality RMs, including the state-of-the-art, we find that issues persist despite prior work with respect to length, sycophancy, and overconfidence.
BIAS-ID: A Framework for Analyzing Transformation Biases in AI-Generated Image Detectors
arXiv:2605.31153v1 Announce Type: new Abstract: Given the surge of harmful AI-generated imagery online, reliably distinguishing authentic images from generated ones has become an urgent research topic. While many proposed detection methods perform well under controlled settings, they often collapse when tested on real-world data.
Side-by-side Comparison Amplifies Dialect Bias in Language Models
Announce Type: replace Abstract: Language models (LMs) can exhibit biases based on variations in their dialects, even in the absence of a dialect label, a behavior known as covert dialect bias. In this work, we quantify covert dialect bias in online discourse by evaluating how LMs associate stereotypical traits (derived from social psychology research on racial bias) with intent-equivalent tweets in Standard American English (SAE) and African-American Vernacular English (AAVE). While prior...
IndoBias: A Dual Track Culturally Grounded Benchmark for LLMs Bias Evaluation in Indonesian Languages
arXiv:2606.01260v1 Announce Type: new Abstract: Despite being home to more than 1300 ethnic groups and 700 indigenous languages, bias in Large Language Models has not been fully studied in Indonesia, thus leaving a critical gap in evaluating representational fairness and localized stereotypes within its uniquely vast, multilingual, and diverse sociocultural landscape. To address this, we introduce IndoBias as a culturally-grounded bias benchmark to assess LLMs bias in Indonesian and three...
Biases in the Blind Spot: Detecting What LLMs Fail to Mention
Announce Type: replace Abstract: Large Language Models (LLMs) often provide chain-of-thought (CoT) reasoning traces that appear plausible, but may hide internal biases. We call these unverbalized biases. Monitoring models via their stated reasoning is therefore unreliable, and existing bias evaluations typically require predefined categories and hand-crafted datasets.
Moral Sensitivity in LLMs: A Tiered Evaluation of Contextual Bias via Behavioral Profiling and Mechanistic Interpretability
arXiv:2605.03217v2 Announce Type: replace Abstract: Large language models (LLMs) are increasingly deployed in settings that require nuanced ethical reasoning, yet existing bias evaluations treat model outputs as simply "biased" or "unbiased." This binary framing misses the gradual, context-sensitive way bias actually emerges. We address this gap in two stages: behavioral profiling and mechanistic validation.
Towards a holistic understanding of Selection Bias for Causal Effect Identification
Announce Type: replace-cross Abstract: Selection bias is pervasive in observational studies. For example, large scale biobanks data can exhibit ``healthy volunteer bias'' when respondents are healthier and of higher socio-economic status than the population they are meant to represent. Recovering causal effects from such sub-population is an important problem in causal inference, as estimating average treatment effects (ATE) from selected populations can result in a severely biased estimate...
A thalamus–brainstem attractor network drives history-biased decisions
Abstract Natural environments often change gradually, making it adaptive to bias decisions on the basis of the recent past — a phenomenon known as serial dependence1,2,3. Large-scale recordings during behaviour have identified that serial dependence is a common motif for decision-making, with neural representations of past experiences found throughout the brain4,5,6,7,8,9,10,11. However, it remains unclear whether this bias arises from dedicated neural circuits with history-specific...
Identifying High-Confidence Social Biases in LLMs for Trustworthy Conversational Tutoring Agents
arXiv:2606.01584v1 Announce Type: new Abstract: Conversational tutoring agents have been shown to improve learning engagement and student outcomes, and large language models (LLMs) are increasingly used in these systems to provide scalable, personalized feedback. However, LLMs may perpetuate or amplify stereotypical social biases, posing particular risks in educational settings. In this study, we evaluate LLMs in conversational tutoring scenarios to identify high-confidence social biases,...