OGLS-SD
No mentions found
This entity hasn't been tracked yet, or Iris is still building its knowledge base.
Related Articles from SNS
OGLS-SD: On-Policy Self-Distillation with Outcome-Guided Logit Steering for LLM Reasoning
Announce Type: replace Abstract: We study on-policy self-distillation (OPSD), where a language model improves its reasoning ability by distilling privileged teacher distributions along its own on-policy trajectories. Despite its promise, OPSD can suffer from training instability due to a pattern mismatch between teacher and student responses. Self-reflected teacher responses may introduce reflection-induced biases and response templates that miscalibrate token-level supervision, ultimately...