Home Knowledge Base Albireo

Albireo

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

Scaling LLM Inference Beyond Amdahl`s Limits via Eliminating Non-Scalable Overheads

arXiv:2606.01927v1 Announce Type: new Abstract: Deployers of online LLM services usually seek to maximize cluster-wide performance given a fixed number of GPUs. Tensor parallelism (TP) is necessary to fit modern models but scales sub-linearly as the TP degree t grows, due to cross-GPU communication and non-scalable runtime work, as predicted by Amdahl's Law. Conversely, increasing t improves memory efficiency and alleviates KV-cache contention and swapping.

arXiv CS 8d ago