Bradley-Terry
No mentions found
This entity hasn't been tracked yet, or Iris is still building its knowledge base.
Related Articles from SNS
Bradley-Terry Rankings for Recommender Systems Across Dataset Taxonomies
arXiv:2606.07492v1 Announce Type: new Abstract: The ranking of recommendation algorithms is a challenging problem since model performance is sensitive to dataset characteristics such as sparsity, sequential structure, and scale. This drives a demand for a proper methodology for fair comparison between algorithms.
Semiparametric Preference Optimization: Your Language Model is Secretly a Single-Index Model
arXiv:2512.21917v3 Announce Type: replace Abstract: Policy alignment to preference data typically assumes a known link function between observed preferences and latent rewards (e.g., Bradley-Terry model / logistic link). Misspecification of this link can bias inferred rewards and misalign learned policies. We study policy alignment under an unknown and unrestricted link function.
Mitigating Reward Hacking in RLHF via Bayesian Non-negative Reward Modeling
arXiv:2602.10623v2 Announce Type: replace Abstract: Reward models learned from human preferences are central to aligning large language models (LLMs) via reinforcement learning from human feedback, yet they are often vulnerable to reward hacking due to noisy annotations and systematic biases such as response length or style. We propose Bayesian Non-Negative Reward Model (BNRM), a principled reward modeling framework that integrates non-negative factor analysis into Bradley-Terry (BT)...
S-SPPO: Semantic-Calibrated Self-Play Preference Optimization
arXiv:2606.01561v1 Announce Type: new Abstract: Aligning Large Language Models (LLMs) with human preferences is often formulated via Direct Preference Optimization (DPO). However, the standard Bradley-Terry instantiation of DPO is limited in modeling common departures from transitivity in human preferences. To address this, recent work has introduced Self-Play Preference Optimization (SPPO), which iteratively refines the policy by training on self-generated win-lose pairs.
Which sparkling water is the best?
The Sparkling Water Report three minds and gullets looking for the winning bubbles With my friends Manuel and Aurélien, also friends of the fizz, we set out to find which sparkling water is the best one. We limited ourselves to ones that you could readily buy in Paris, up to the limit of what we could carry. This means 14 waters, blind tested: each water was poured in an opaque glass associated to a number, the glasses were then shuffled and turned facing opposite of the drinkers.
Differentially Private Preference Data Synthesis for Large Language Model Alignment
arXiv:2605.30808v1 Announce Type: new Abstract: Preference alignment is a crucial post-training step for large language models (LLMs) to ensure their outputs align with human values. However, post-training on real human preference data raises privacy concerns, as these datasets often contain sensitive user prompts and human judgments. To address this, we propose DPPrefSyn, a novel algorithm for generating differentially private (DP) synthetic preference data to enable privacy-preserving...
DynaCF: Mitigating Shortcut Learning in Reward Models via Dynamic Counterfactual Sensitivity
arXiv:2606.09043v1 Announce Type: new Abstract: Reward models trained from pairwise preferences often exploit superficial shortcut cues rather than learning true response quality. We propose DynaCF, a dynamic reweighting framework for mitigating shortcut learning in reward model training. Unlike static shortcut heuristics, DynaCF measures shortcut sensitivity online during optimization by applying semantics-preserving counterfactual perturbations and tracking the resulting margin shifts and...
Rethinking Sales Lead Scoring with LLM-based Hierarchical Preference Ranking
arXiv:2606.04387v1 Announce Type: new Abstract: Sales lead conversion in high-stakes domains (e.g., automotive, real estate) differs fundamentally from e-commerce recommendation due to prolonged decision cycles and multi-stage funnels. Traditional lead scoring methods rule-based scorecards, machine learning, or pointwise CTR models face severe challenges: sparse supervision, a semantic gap in unstructured CRM logs, and inability to capture relative lead priority.
Capturing LLM Capabilities via Evidence-Calibrated Query Clustering
arXiv:2605.17110v2 Announce Type: replace Abstract: Query clustering organizes queries into groups that reflect shared latent capability demands, enabling capability-aware LLM evaluation. Existing clustering methods, which primarily rely on semantic taxonomies or embeddings, often fail to capture such latent capability requirements due to a misalignment between surface-level semantics and actual model performance. We propose ECC, an algorithm that calibrates prior semantic embeddings using...
HumorRank: A Tournament-Based Leaderboard for Evaluating Humor Generation in Large Language Models
arXiv:2604.19786v2 Announce Type: replace Abstract: Humor remains difficult to evaluate in large language models (LLMs) because what makes a response funny is subjective, comparative, and shaped by interacting comedic mechanisms rather than a single scalar property. Existing humor evaluation protocols therefore tend to produce isolated scores or task-specific judgments that are difficult to compare across models. We introduce HumorRank, a tournament-based framework for ranking textual humor...