Home Science Robust Restless Multi-Armed Bandit for Data Center...
Science

Robust Restless Multi-Armed Bandit for Data Center Flexibility Services Through Virtual Machine Scheduling

Key Points

arXiv:2605.19116v2 Announce Type: replace Abstract: Energy demands from data centers have surged and stressed the grid in recent years. Electric grids require balancing supply and demand every second, motivating demand response (reduction) from large loads, including data centers. This can be achieved by rescheduling jobs on a physical machine.

arXiv:2605.19116v2 Announce Type: replace Abstract: Energy demands from data centers have surged and stressed the grid in recent years. Electric grids require balancing supply and demand every second, motivating demand response (reduction) from large loads, including data centers. This can be achieved by rescheduling jobs on a physical machine. Its real-time implementation is uncertain due to fluctuating resource utilization, and rescheduling incurs quality-of-service (QoS) losses that providers are unwilling to disclose. We propose a restless multi-armed bandit (RMAB) framework, in which the grid operator requests load reductions without access to detailed job-rescheduling procedures. Using open-source virtual machine (VM) datasets, we model job arrivals and rescheduling at each data center as a restless arm in a Markov decision process (MDP) and derive Whittle-index-based policies using the learned transition function via Thompson sampling. To overcome the weakness of an increasingly long learning process due to an enlarged state space, we use a mixed strategy that includes a global upper confidence bound (UCB) and encodes trust indices to enhance robustness and accelerate learning. Results show that the proposed mixed-strategy algorithm remains robust across varying state-space sizes and consistently outperforms the pure Thompson-Whittle (TW) algorithm, especially when contextual information is noisy. It also demonstrates superior performance compared to the state-of-the-art EXP4 framework. We provided open-source code to ensure reproducibility.
QoS (ORG) VM (ORG) Markov (ORG) MDP (ORG) Thompson (ORG) UCB (ORG) Thompson-Whittle (ORG) TW (ORG)
Originally published by arXiv CS Read original →