Home Knowledge Base Logit-Free On-Policy Distillation

Logit-Free On-Policy Distillation

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

OmniOPD: Logit-Free On-Policy Distillation via Speculative Verification

arXiv:2606.01476v1 Announce Type: new Abstract: On-Policy Distillation (OPD) trains a student model on its own generative trajectories under dense token-level feedback from a stronger teacher, mitigating both the off-policy distribution shift of Supervised Fine-Tuning (SFT) and the sparse credit assignment of Reinforcement Learning (RL). However, standard OPD faces two coupled limitations. First, it requires direct access to the teacher's token-level logits, excluding a broad class of...

arXiv CS 8d ago