KVCache
No mentions found
This entity hasn't been tracked yet, or Iris is still building its knowledge base.
Related Articles from SNS
Simple is Better: Multiplication May Be All You Need for LLM Request Scheduling
Announce Type: replace Abstract: High-quality LLM request scheduling requires meeting two key objectives: ensuring the routed instance has KVCache to accelerate request execution, and ensuring that the workload is balanced across instances. Achieving both objectives is challenging because pursuing one may compromise the other. Current approaches use various combinators (e.g., linear combinations) to compute a scheduling score that combines indicators for the two objectives.