Home › Knowledge Base › FCFS

FCFS

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

Tail Optimality and Performance Analysis of the Nudge*(M) Scheduling Algorithm

Announce Type: replace Abstract: Recently it was shown that the response time of First-Come-First-Served (FCFS) scheduling can be stochastically and asymptotically improved upon by the {\it Nudge} scheduling algorithm in case of light-tailed job size distributions. Such improvements are feasible even when the jobs are partitioned into two types and the scheduler only has information about the type of incoming jobs (but not their size). In this paper we introduce Nudge*$(M)$ scheduling, where...

arXiv CS 8d ago

Fairness-Aware and Latency-Controllable Scheduling for Chunked-Prefill LLM Serving

arXiv:2606.09061v1 Announce Type: new Abstract: As large language models (LLMs) are increasingly deployed with highly heterogeneous workloads, chunked-prefill execution has emerged as a mainstream serving architecture. Balancing scheduling fairness and latency stability in such environments is critical; otherwise, severe head-of-line blocking and request starvation will degrade user experience. However, existing systems rely on rigid First-Come, First-Served (FCFS) policies and static token...

arXiv CS 1d ago

Clairvoyant: Predictive SJF Scheduling to Mitigate Head-of-Line Blocking in Serial LLM Backends

Announce Type: new Abstract: Serial LLM inference backends -- such as Ollama -- process requests one at a time under FCFS admission, causing Head-of-Line Blocking (HOLB) under mixed workloads at high utilisation: short factual queries can be delayed by minutes behind long generation jobs. While cloud-scale deployments mitigate HOLB via continuous batching (vLLM, Orca), these solutions require tens of GB of VRAM for concurrent KV-caches -- infeasible for memory-constrained edge and local...

arXiv CS 2d ago

Terastal: Layer-Variant-based Scheduling for Real-Time Multi-DNN Workloads on Heterogeneous Accelerators

Announce Type: new Abstract: Heterogeneous DNN accelerators improve soft real-time multi-DNN execution by mapping each layer to its preferred accelerator to reduce latency. However, under skewed workloads, large layer-latency differences across accelerators limit scheduling flexibility and increase deadline misses. To address this challenge, we introduce layer variants, customized layer implementations that reduce latency gaps on non-preferred accelerators.

arXiv CS 2d ago