Contrastive Preference Optimization
No mentions found
This entity hasn't been tracked yet, or Iris is still building its knowledge base.
Related Articles from SNS
Learning from Fine-Grained Visual Discrepancies: Mitigating Multimodal Hallucinations via In-Context Visual Contrastive Optimization
arXiv:2605.31312v1 Announce Type: new Abstract: Multimodal hallucination remains a persistent challenge for Vision-Language Models (VLMs). Standard textual Direct Preference Optimization (DPO) often fails to mitigate it due to a lack of explicit visual supervision. While existing works introduce visual preference DPO by contrasting original images against negative ones, they suffer from a theoretically inconsistent objective caused by partition function mismatches and rely on coarse-grained...
FlowPRO: Reward-Free Reinforced Fine-Tuning of Flow-Matching VLAs via Proximalized Preference Optimization
Announce Type: new Abstract: Post-training Vision-Language-Action (VLA) models into policies that can be reliably deployed on real robots remains a major bottleneck. SFT and DAgger exploit failure signals only indirectly, and reward-based RL is bottlenecked by the difficulty of real-world reward design and of training reliable critics. We present FlowPRO, a reward-free offline reinforced fine-tuning framework for flow-matching VLAs.
Semantic Retrieval for Product Search in E-Commerce
arXiv:2606.01504v1 Announce Type: new Abstract: Semantic retrieval in e-commerce must handle short, noisy, and colloquial queries over large product catalogs with fine-grained attribute distinctions. We present a Siamese LLM dual-encoder trained through a two-stage pipeline: contrastive learning with a false-negative margin mask to prevent penalization of near-duplicate products, followed by Relative Odds Alignment for Retrieval (ROAR), a preference optimization objective that extends...
Maximizing Mutual Information Between Prompt and Response Improves LLM Performance With No Additional Data
arXiv:2603.19294v4 Announce Type: replace Abstract: While post-training has successfully improved large language models (LLMs) across a variety of domains, these gains heavily rely on human-labeled data or external verifiers. Existing data has already been exploited, and new data is expensive to collect. Moreover, true intelligence goes far beyond verifiable tasks.
Non-obvious Manipulability in the Additively Separable Group Activity Selection Problem
new Abstract: In this work, we study the additively separable Group Activity Selection Problem (AS-GASP) in an imperfect information setting, where agents have private preferences over activities and weights over other agents. Our goal is to design mechanisms that assign agents to activities based on their declared preferences and weights, with the objective of maximizing social welfare while ensuring truthful reporting. We, therefore, focus on the notion of non-obvious manipulability (NOM),...
Synthetic Contrastive Reasoning for Multi-Table Q&A
Announce Type: new Abstract: Multi-table question answering requires models to retrieve relevant evidence, link schemas, and perform compositional reasoning across relational tables. Existing multi-table Q&A resources typically provide questions and final answers but lack reasoning supervision that explains how answers are derived. To address this gap, we construct a synthetic contrastive reasoning-trace dataset for MMQA by generating validated positive traces and plausible negative...
ZIPP:Zero-shot Image Personalization from Personas
arXiv:2606.08841v1 Announce Type: new Abstract: Text-to-image diffusion models are increasingly deployed in open-ended creative contexts, yet their outputs remain impersonal, optimized for aggregate aesthetics rather than individual taste. Human preferences are pluralistic: one user favoring muted, nostalgic portraits may prefer vibrant street photography, while another gravitates toward dreamy film aesthetics. Existing methods require dense interaction histories or per-user fine-tuning,...
Mixture-of-Experts Knowledge Graph Retrieval-Augmented Generation for Multi-Agent LLM-based Recommendation
Announce Type: replace Abstract: Large language models (LLMs) have recently been adopted for recommendations due to their ability to understand user intent and item semantics. However, LLM-based recommender systems often rely on parametric knowledge and suffer from outdated knowledge, motivating knowledge graph retrieval-augmented generation (KG-RAG) to ground recommendations on structured, up-to-date KGs. Despite this promise, effective KG-RAG in recommendations faces great challenges.
Towards Streaming Synchronized Spatial Audio Generation via Autoregressive Diffusion Transformer
arXiv:2605.30940v1 Announce Type: cross Abstract: Real-time and accurate spatial audio generation is pivotal for delivering an immersive experience. However, existing spatial audio synthesis technologies are often encumbered by a tradeoff between generation quality and high inference latency, as well as difficulty in capturing precise spatial information from multimodal inputs. To address these challenges, we propose SwanSphere, a unified streaming framework for high-fidelity spatial audio...
Trump threatens consequences after Iran shoots down US helicopter
President Donald Trump announced that Iran shot down an American Apache helicopter late on Monday, promising consequences for the country on Tuesday as peace negotiations stall and an April ceasefire appears increasingly under strain. “There were two pilots involved, both are safe and uninjured,” he wrote on Truth Social. “Nevertheless, the United States must, of necessity, respond to this attack.”