Home › Knowledge Base › Proximalized Preference Optimization

Proximalized Preference Optimization

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

FlowPRO: Reward-Free Reinforced Fine-Tuning of Flow-Matching VLAs via Proximalized Preference Optimization

Announce Type: new Abstract: Post-training Vision-Language-Action (VLA) models into policies that can be reliably deployed on real robots remains a major bottleneck. SFT and DAgger exploit failure signals only indirectly, and reward-based RL is bottlenecked by the difficulty of real-world reward design and of training reliable critics. We present FlowPRO, a reward-free offline reinforced fine-tuning framework for flow-matching VLAs.

arXiv CS 5d ago

BiasGRPO: Stabilizing Bias Mitigation in High-Variance Reward Landscapes via Group-Relative Policy Optimization

arXiv:2606.04807v1 Announce Type: new Abstract: Mitigating social bias in Large Language Models (LLMs) presents a distinct alignment challenge: unlike verifiable tasks, bias lacks a single ground truth, creating a high-variance, subjective reward landscape. Previous preference-based fine-tuning methods have major trade-offs: Direct Preference Optimization (DPO) is limited by the lack of exploration inherent in offline training, while Proximal Policy Optimization (PPO) can lead to training...

arXiv CS 6d ago

When RLHF Fails: A Mechanistic Taxonomy of Reward Hacking, Collapse, and Evaluator Gaming

arXiv:2606.03238v1 Announce Type: new Abstract: Reinforcement learning from human feedback (RLHF) makes large-scale post-training possible by replacing an underspecified human objective with learned and scalable proxies. The same substitution creates a structured failure surface: optimization can raise the learned reward while external quality falls, degrade both proxy and judge scores, reveal proxy under-alignment, or produce evaluator-specific disagreement. We present an empirical...

arXiv CS 7d ago

US strikes Iran in response to helicopter shootdown

US strikes Iran in response to helicopter shootdown Published June 9, 2026last updated June 9, 2026What you need to know - US Central Command announces strikes on Iran after downing of US Apache helicopter over the Strait of Hormuz - Donald Trump had earlier threatened an attack on Iran over the helicopter shootdown - Israeli airstrikes hit Tyre in southern Lebanon after the military ordered the entire city to evacuate - Donald Trump says a Middle East peace deal is in the 'final throes' -...

Deutsche Welle 1d ago

US strikes Iran in response to Apache helicopter shootdown

Deutsche Welle 1d ago

Structural basis for chaperone-guided assembly of RNA-induced silencing complex

Abstract The RNA-induced silencing complex (RISC), comprising an Argonaute (AGO) protein and a small RNA, is the central effector in RNA silencing. Small RNAs are loaded onto AGO as bulky duplexes in an HSP70- and HSP90-dependent process1,2,3, but the molecular mechanism remains poorly understood. Here we identify the human AGO–HSP90–p23 complex, which captures AGO in an RNA-free state, termed the AGO maturation complex (AMC).

Nature 23h ago