Home Knowledge Base NFP

NFP

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

How Much Parallelism Is "Free"? A Principle of Near-Free Parallelism for Parallel Decoding

arXiv:2605.30851v1 Announce Type: new Abstract: Parallel decoding improves generation efficiency by processing multiple decode positions within a single decode forward, but reported speedups conflate algorithmic token utilization with the system cost of executing multiple positions. We isolate the system side by introducing Near-Free Parallelism (NFP), the maximum number of positions executable at near-free latency. Analyzing Dense FFNs, MoE FFNs, and Attention against an idle-compute...

arXiv CS 9d ago