Home Knowledge Base NVFP4 LLM Distillatio

NVFP4 LLM Distillatio

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

Beyond Output Matching: Preserving Internal Geometry in NVFP4 LLM Distillatio

arXiv:2606.05682v1 Announce Type: new Abstract: Demand for low-precision inference, including NVFP4-based approaches, has grown as large language models are increasingly deployed in latency and cost constrained production environments. Quantization-aware distillation (QAD) helps recover accuracy lost under low bit quantization by training a quantized student to match the output distribution of a frozen higher precision teacher via a KL-divergence loss. In this work, we first provide a...

arXiv CS 5d ago