Home Knowledge Base OGLS-SD

OGLS-SD

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

OGLS-SD: On-Policy Self-Distillation with Outcome-Guided Logit Steering for LLM Reasoning

Announce Type: replace Abstract: We study on-policy self-distillation (OPSD), where a language model improves its reasoning ability by distilling privileged teacher distributions along its own on-policy trajectories. Despite its promise, OPSD can suffer from training instability due to a pattern mismatch between teacher and student responses. Self-reflected teacher responses may introduce reflection-induced biases and response templates that miscalibrate token-level supervision, ultimately...

arXiv CS 8d ago