Home Knowledge Base PBSD

PBSD

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

PBSD: Privileged Bayesian Self-Distillation for Long-Horizon Credit Assignment

arXiv:2606.09348v1 Announce Type: new Abstract: Long-horizon agentic tasks pose a fundamental credit assignment challenge for outcome-base reinforcement learning: trajectory-level rewards verify final correctness but provide limited guidance on which intermediate reasoning steps or tool interactions contribute to the outcome. The difficulty is especially pronounced in multi-turn search agents, where successful trajectories may contain misleading actions and failed trajectories may contain...

arXiv CS 1d ago