Multi-Objective Preference Optimization
No mentions found
This entity hasn't been tracked yet, or Iris is still building its knowledge base.
Related Articles from SNS
Multi-Objective Bayesian Optimization via Adaptive \varepsilon-Constraints Decomposition
arXiv:2604.15959v2 Announce Type: replace Abstract: Multi-objective Bayesian optimization (MOBO) provides a principled framework for optimizing multiple expensive black-box functions. However, existing MOBO methods often struggle with coverage, scalability, and handling constraints and preferences. In this work we propose STAGE-BO, Sequential Targeting Adaptive Gap-Filling $\varepsilon$-Constraint Bayesian Optimization: by analyzing the coverage of the surrogate Pareto front, our method...
MAdam: Metric-Aware Multi-Objective Adam
Announce Type: new Abstract: Multi-objective optimization (MOO) underlies many machine learning problems, yet MOO solvers across the loss-balancing, gradient-balancing, and Pareto-based families almost universally hand their reconciled directions to Adam~\cite{kingma2015adam}. We show this coupling introduces two systematic gaps between the solver's intent and the optimizer's execution. The first is a \emph{weighting mismatch}: Adam's second-moment denominator entangles the time-varying...
Multi-Objective Preference Optimization: Improving Human Alignment of Generative Models
Announce Type: replace Abstract: Post-training LLMs with RLHF and preference optimization methods (e.g., DPO, IPO) has greatly improved alignment, yet these approaches assume a single objective. In reality, humans express multiple, often conflicting objectives, such as helpfulness and harmlessness, with no natural scalarization. We study the multi-objective preference alignment problem, where a policy must balance several objectives simultaneously.
FIRM: Federated In-client Regularized Multi-objective Alignment for Large Language Models
Announce Type: replace Abstract: Aligning Large Language Models (LLMs) with human values often involves balancing multiple, conflicting objectives such as helpfulness and harmlessness. Training these models is computationally intensive, and centralizing the process raises significant data privacy concerns. Federated Learning (FL) offers a compelling alternative, but existing Federated Multi-Objective Optimization (FMOO) methods face severe communication bottlenecks as their reliance on...
Provably Efficient Personalized Multi-Objective Bandits with Proactive Conversational Queries
Announce Type: new Abstract: Personalized decision-making in multi-objective bandits requires learning user-specific trade-offs among competing objectives. Since arm utility depends on both unknown rewards and unknown preferences, existing methods infer preferences only from utility feedback, entangling preference learning with reward exploration. In practice, however, users often reveal their priorities through proactive conversational queries (e.g., "cheap and clean hotel"), yet this...
CoAction: Cross-task Correlation-aware Pareto Set Learning
Announce Type: replace Abstract: Pareto set learning (PSL) is an emerging paradigm in multi-objective optimization that trains neural networks to map preference vectors to Pareto optimal solutions. However, existing PSL methods primarily focus on solving a single multi-objective optimization problem at a time. This limitation not only increases computational costs in multi-objective multitask optimization scenarios by requiring a separate model for each task, but also fails to exploit the...
Adversarial Instance Generation and Robust Training for Neural Combinatorial Optimization with Multiple Objectives
arXiv:2601.01665v2 Announce Type: replace Abstract: Deep reinforcement learning (DRL) has shown great promise in addressing multi-objective combinatorial optimization problems (MOCOPs). Nevertheless, the robustness of these learning-based solvers has remained insufficiently explored, especially across diverse and complex problem distributions. In this paper, we propose a unified robustness-oriented framework for preference-conditioned DRL solvers for MOCOPs.
A Game-Theoretic Decision Framework for Optimal Selection of Coordination Detection Methods in Multi-UAV Fleet Operations
arXiv:2606.02383v1 Announce Type: new Abstract: Detecting coordination among unmanned aerial vehicle (UAV) fleets operating in shared airspace and identifying the route-lead aircraft whose navigation decisions govern fleet behavior presents a fundamental speed--accuracy trade-off: fast methods enable real-time traffic management but sacrifice detection fidelity, while accurate methods may exceed the time budget for actionable airspace deconfliction. This paper presents a game-theoretic...
A Game-Theoretic Decision Framework for Optimal Selection of Coordination Detection Methods in Multi-UAV Fleet Operations
Announce Type: replace Abstract: Detecting coordination among unmanned aerial vehicle (UAV) fleets operating in shared airspace and identifying the route-lead aircraft whose navigation decisions govern fleet behavior presents a fundamental speed--accuracy trade-off: fast methods enable real-time traffic management but sacrifice detection fidelity, while accurate methods may exceed the time budget for actionable airspace deconfliction. This paper presents a game-theoretic decision framework...
BiasGRPO: Stabilizing Bias Mitigation in High-Variance Reward Landscapes via Group-Relative Policy Optimization
arXiv:2606.04807v1 Announce Type: new Abstract: Mitigating social bias in Large Language Models (LLMs) presents a distinct alignment challenge: unlike verifiable tasks, bias lacks a single ground truth, creating a high-variance, subjective reward landscape. Previous preference-based fine-tuning methods have major trade-offs: Direct Preference Optimization (DPO) is limited by the lack of exploration inherent in offline training, while Proximal Policy Optimization (PPO) can lead to training...