Home Knowledge Base Dual-Margin Dynamic beta Adjustment

Dual-Margin Dynamic beta Adjustment

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

DynamicPO: Dynamic Preference Optimization for Recommendation

arXiv:2605.00327v2 Announce Type: replace Abstract: In large language model (LLM)-based recommendation systems, direct preference optimization (DPO) effectively aligns recommendations with user preferences, requiring multi-negative objective functions to leverage abundant implicit-feedback negatives and sharpen preference boundaries. However, our empirical analyses reveal a counterintuitive phenomenon, preference optimization collapse, where increasing the number of negative samples can lead...

arXiv CS 1d ago