Home Science Multi-Agent Lipschitz Bandits
Science

Multi-Agent Lipschitz Bandits

Key Points

arXiv:2602.16965v2 Announce Type: replace Abstract: We study the decentralized multi-player stochastic bandit problem over a continuous, Lipschitz-structured action space where hard collisions yield zero reward. Our objective is to design a communication-free policy that maximizes collective reward, while separating coordination costs from learning costs. We propose a modular protocol that first solves the multi-agent coordination problem by identifying and seating players on distinct,...

arXiv:2602.16965v2 Announce Type: replace Abstract: We study the decentralized multi-player stochastic bandit problem over a continuous, Lipschitz-structured action space where hard collisions yield zero reward. Our objective is to design a communication-free policy that maximizes collective reward, while separating coordination costs from learning costs. We propose a modular protocol that first solves the multi-agent coordination problem by identifying and seating players on distinct, high-value regions via a novel maxima-directed search and then decouples the problem into $N$ independent single-player Lipschitz bandits. In the consensus regime, we obtain an end-to-end regret bound whose dominant learning term is \(\tilde{O}(T^{(d+1)/(d+2)})\), matching the single-player Lipschitz rate; the upfront coordination cost is horizon-independent at fixed confidence and only polylogarithmic in \(T\) in the expected-regret form. Under an additional public coverage/scheduling assumption for the epochic extension, we also obtain a gap-free \(\tilde{O}(T^{(d+1)/(d+2)})\) guarantee. We further derive a matching lower bound for the dominant learning term and extend the framework to general distance-threshold collision models.
Lipschitz Bandits (PERSON) Lipschitz (PERSON) \(\tilde{O}(T^{(d+1)/(d+2)})\ (ORG) \(T\ (LOCATION)
Originally published by arXiv CS Read original →