POPO

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

RLVR without Ineffective Samples: Group Prioritized Off-Policy Optimization for LLM Reasoning

arXiv:2606.01281v1 Announce Type: new Abstract: Reinforcement learning with verifiable rewards (RLVR) has emerged as a powerful paradigm for enhancing the reasoning capabilities of large language models (LLMs). However, its effectiveness is substantially hindered by the prevalence of ineffective training data: many sampled prompts yield response groups that are either entirely correct or entirely incorrect, resulting in zero-variance rewards and limited learning signals. Recent...

arXiv CS 8d ago

Taiji: Pareto Optimal Policy Optimization with Semantics-IDs Trade-off for Industrial LLM-Enhanced Recommendation

arXiv:2606.03866v1 Announce Type: new Abstract: Scaling recommender systems via large language models (LLMs) has become a prominent trend in the industry. However, aligning the LLM's semantic space with the recommender's ID space via post-training (e.g., SFT and RL) remains challenging. Existing LLM4Rec paradigms are bottlenecked by two main issues: (1) the difficulty of measuring and improving chain-of-thought (CoT) quality in open-domain recommendation during SFT, and (2) the neglect of...

arXiv CS 7d ago

Sovereign News Station

Self-hosted. No tracking. No ads. Independent news intelligence powered by sovereign infrastructure.

Daily briefing to your inbox:

Subscribed. Welcome aboard.

Home Live Analysis Trending Analytics Operations RSS Feed About

Sovereign News Station — Independent news intelligence · Self-hosted · No tracking