Home Knowledge Base PR2

PR2

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

PR2: Predictive Routing Replay for MoE-Based LLM Reinforcement Learning

Announce Type: replace Abstract: Mixture of Experts (MoE) Large Language Models (LLMs) achieve strong performance at scale. However, reinforcement learning (RL) on MoE-based LLMs often suffers from training instability. A root cause is router drift, i.e., expert activations can change drastically across model updates and differ between disaggregated rollout and training phases, causing large rollout--training mismatch and unstable importance sampling weights in PPO-style RL algorithms.

arXiv CS 7d ago

Direct Informed Sampling on Riemannian Manifolds via Loewner Order Lower Bounds

arXiv:2606.02879v1 Announce Type: new Abstract: Informed sampling techniques accelerate sampling-based motion planners by focusing the search on promising regions of the state space, yet most existing methods rely on Euclidean heuristics that become inadmissible under configuration-dependent Riemannian metrics. While scalar eigenvalue bounds restore admissibility by uniformly scaling the Euclidean distance, they discard the directional structure of the metric, producing overly conservative...

arXiv CS 7d ago