Home Knowledge Base Contrastive Preference Optimization

Contrastive Preference Optimization

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

Learning from Fine-Grained Visual Discrepancies: Mitigating Multimodal Hallucinations via In-Context Visual Contrastive Optimization

arXiv:2605.31312v1 Announce Type: new Abstract: Multimodal hallucination remains a persistent challenge for Vision-Language Models (VLMs). Standard textual Direct Preference Optimization (DPO) often fails to mitigate it due to a lack of explicit visual supervision. While existing works introduce visual preference DPO by contrasting original images against negative ones, they suffer from a theoretically inconsistent objective caused by partition function mismatches and rely on coarse-grained...

arXiv CS 9d ago

FlowPRO: Reward-Free Reinforced Fine-Tuning of Flow-Matching VLAs via Proximalized Preference Optimization

Announce Type: new Abstract: Post-training Vision-Language-Action (VLA) models into policies that can be reliably deployed on real robots remains a major bottleneck. SFT and DAgger exploit failure signals only indirectly, and reward-based RL is bottlenecked by the difficulty of real-world reward design and of training reliable critics. We present FlowPRO, a reward-free offline reinforced fine-tuning framework for flow-matching VLAs.

arXiv CS 5d ago

Semantic Retrieval for Product Search in E-Commerce

arXiv:2606.01504v1 Announce Type: new Abstract: Semantic retrieval in e-commerce must handle short, noisy, and colloquial queries over large product catalogs with fine-grained attribute distinctions. We present a Siamese LLM dual-encoder trained through a two-stage pipeline: contrastive learning with a false-negative margin mask to prevent penalization of near-duplicate products, followed by Relative Odds Alignment for Retrieval (ROAR), a preference optimization objective that extends...

arXiv CS 8d ago

Maximizing Mutual Information Between Prompt and Response Improves LLM Performance With No Additional Data

arXiv:2603.19294v4 Announce Type: replace Abstract: While post-training has successfully improved large language models (LLMs) across a variety of domains, these gains heavily rely on human-labeled data or external verifiers. Existing data has already been exploited, and new data is expensive to collect. Moreover, true intelligence goes far beyond verifiable tasks.

arXiv CS 5d ago

Non-obvious Manipulability in the Additively Separable Group Activity Selection Problem

new Abstract: In this work, we study the additively separable Group Activity Selection Problem (AS-GASP) in an imperfect information setting, where agents have private preferences over activities and weights over other agents. Our goal is to design mechanisms that assign agents to activities based on their declared preferences and weights, with the objective of maximizing social welfare while ensuring truthful reporting. We, therefore, focus on the notion of non-obvious manipulability (NOM),...

arXiv CS 6d ago

Synthetic Contrastive Reasoning for Multi-Table Q&A

Announce Type: new Abstract: Multi-table question answering requires models to retrieve relevant evidence, link schemas, and perform compositional reasoning across relational tables. Existing multi-table Q&A resources typically provide questions and final answers but lack reasoning supervision that explains how answers are derived. To address this gap, we construct a synthetic contrastive reasoning-trace dataset for MMQA by generating validated positive traces and plausible negative...

arXiv CS 5d ago

ZIPP:Zero-shot Image Personalization from Personas

arXiv:2606.08841v1 Announce Type: new Abstract: Text-to-image diffusion models are increasingly deployed in open-ended creative contexts, yet their outputs remain impersonal, optimized for aggregate aesthetics rather than individual taste. Human preferences are pluralistic: one user favoring muted, nostalgic portraits may prefer vibrant street photography, while another gravitates toward dreamy film aesthetics. Existing methods require dense interaction histories or per-user fine-tuning,...

arXiv CS 1d ago

Mixture-of-Experts Knowledge Graph Retrieval-Augmented Generation for Multi-Agent LLM-based Recommendation

Announce Type: replace Abstract: Large language models (LLMs) have recently been adopted for recommendations due to their ability to understand user intent and item semantics. However, LLM-based recommender systems often rely on parametric knowledge and suffer from outdated knowledge, motivating knowledge graph retrieval-augmented generation (KG-RAG) to ground recommendations on structured, up-to-date KGs. Despite this promise, effective KG-RAG in recommendations faces great challenges.

arXiv CS 9d ago

Towards Streaming Synchronized Spatial Audio Generation via Autoregressive Diffusion Transformer

arXiv:2605.30940v1 Announce Type: cross Abstract: Real-time and accurate spatial audio generation is pivotal for delivering an immersive experience. However, existing spatial audio synthesis technologies are often encumbered by a tradeoff between generation quality and high inference latency, as well as difficulty in capturing precise spatial information from multimodal inputs. To address these challenges, we propose SwanSphere, a unified streaming framework for high-fidelity spatial audio...

arXiv CS 9d ago

Trump threatens consequences after Iran shoots down US helicopter

President Donald Trump announced that Iran shot down an American Apache helicopter late on Monday, promising consequences for the country on Tuesday as peace negotiations stall and an April ceasefire appears increasingly under strain. “There were two pilots involved, both are safe and uninjured,” he wrote on Truth Social. “Nevertheless, the United States must, of necessity, respond to this attack.”

Politico EU 1d ago