NanoSpec
No mentions found
This entity hasn't been tracked yet, or Iris is still building its knowledge base.
Related Articles from SNS
NanoSpec: Accelerating Speculative Decoding using Minimalist In-Context Vocabularies
arXiv:2605.26444v2 Announce Type: replace Abstract: The massive vocabulary sizes of large language models, often exceeding 100k tokens, impose a computational bottleneck on the final linear projection layer during speculative decoding. Existing vocabulary pruning solutions rely on static or coarsely-grained sub-vocabularies that necessitate large active sizes ($\sim$30k) to maintain draft quality. We propose NanoSpec, a novel training-free approach that breaks this trade-off by dynamically...