Home Knowledge Base TreeFlash

TreeFlash

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

TreeFlash: Parallel AR-Approximation for Faster Speculative Decoding

Announce Type: new Abstract: One-shot block drafters for speculative decoding generate the full draft in a single forward pass, achieving strong throughput by eliminating sequential token generation. However, they predict each draft token conditioned only on the prefix context, with no dependence on previously drafted tokens. This non-autoregressive conditioning causes the drafter's distribution to diverge from the verifier's true autoregressive distribution as draft depth grows.

arXiv CS 7d ago