Home › Knowledge Base › Multi-Armed Bandit

Multi-Armed Bandit

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

Robust Restless Multi-Armed Bandit for Data Center Flexibility Services Through Virtual Machine Scheduling

arXiv:2605.19116v2 Announce Type: replace Abstract: Energy demands from data centers have surged and stressed the grid in recent years. Electric grids require balancing supply and demand every second, motivating demand response (reduction) from large loads, including data centers. This can be achieved by rescheduling jobs on a physical machine.

arXiv CS 2d ago

Throughput Optimization for Multi-AP IEEE P802.11bq Networks Based on Combinatorial Multi-Armed Bandits

arXiv:2606.03528v1 Announce Type: new Abstract: This paper addresses distributed throughput optimization for dense multi-AP IEEE P802.11bq networks. We develop a packet-level model that jointly captures cross-link carrier-sense multiple access with collision avoidance (CSMA/CA), sub-7GHz RTS/CTS exchange, beam-training overhead, directional mmWave interference, signal-to-interference-plus-noise-ratio (SINR)-based MCS selection, and retransmissions. The resulting configuration problem is...

arXiv CS 7d ago

Online Learning with Recency: Algorithms for Sliding-window Streaming Multi-armed Bandits

arXiv:2606.08977v1 Announce Type: new Abstract: Motivated by the recency effect in online learning, we study algorithms for single-pass *sliding-window streaming multi-armed bandits (MABs)* In this setting, we are given $n$ arms with unknown sub-Gaussian reward distributions and a parameter $W$. The arms arrive in a single-pass stream, and only the most recent $W$ arms are considered valid.

arXiv CS 1d ago

ALMAB-DC: Active Learning, Multi-Armed Bandits, and Distributed Computing for Sequential Experimental Design and Black-Box Optimization

arXiv:2603.21180v4 Announce Type: replace Abstract: Sequential experimental design under expensive, gradient-free objectives is a central challenge in computational statistics: evaluation budgets are tightly constrained and information must be extracted efficiently from each observation. We propose \textbf{ALMAB-DC}, a GP-based sequential design framework combining active learning, multi-armed bandits (MAB), and distributed asynchronous computing for expensive black-box experimentation. A...

arXiv CS 6d ago

Multi-Armed Bandits with Arriving Arms: Sequential Screening, Dynamic Regret, and Sublinear Guarantees

arXiv:2606.09002v1 Announce Type: cross Abstract: We study a stochastic multi-armed bandit problem in which the set of available arms expands over time. This setting arises in sequential experimentation when new actions or treatments become available during an ongoing study, making regret against a single best arm in hindsight inappropriate. We instead evaluate performance relative to the best arm currently available, leading to a dynamic-regret criterion for arriving-arm environments.

arXiv CS 1d ago

Autonomous Air-Ground Vehicle Operations Optimization in Hazardous Environments: A Multi-Armed Bandit Approach

arXiv:2508.08217v2 Announce Type: replace Abstract: Hazardous environments such as chemical spills, radiological zones, and bio-contaminated sites pose significant threats to human safety and public infrastructure. Rapid and reliable hazard mitigation in these settings often unsafe for humans, calling for autonomous systems that can adaptively sense and respond to evolving risks. This paper presents a decision-making framework for autonomous vehicle dispatch in hazardous environments with...

arXiv CS 1d ago

Compute Allocation in Evolutionary Search: From Depth-Breadth to Multi-Armed Bandits

Announce Type: replace Abstract: LLM-guided evolutionary search (Evolve systems) has reached state-of-the-art results on mathematical and combinatorial tasks, yet most existing systems report only the best of many runs and leave the run-to-run distribution undocumented. We ask how a fixed budget of LLM calls should be allocated, and how reliably a single run reaches the reported numbers. Sweeping the depth-breadth grid over five models and three tasks, we identify two empirical regularities:...

arXiv CS 9d ago

Bandit Simulation for Average Reward Inference

arXiv:2606.00913v1 Announce Type: cross Abstract: Multi-arm bandit algorithms are increasingly used in online platforms, clinical trials, and social science experiments, but valid statistical inference on their performance remains an open challenge. After deploying bandits, a natural question is whether one can construct a confidence interval for its mean reward and assess whether it reliably outperforms a baseline policy. The total reward achieved in any single bandit deployment is random,...

arXiv CS 8d ago

Best-Arm Identification-Based Trust Region Selection for Bayesian Optimization on Multimodal Functions

arXiv:2605.31050v1 Announce Type: new Abstract: Gaussian process-based Bayesian optimization (BO) is a popular approach for expensive black-box optimization, but its performance often degrades on complex multimodal or high-dimensional problems. Trust region-based BO mitigates this issue by focusing on local regions, and recent studies suggest that selecting an effective region can be formulated as a multi-armed bandit problem. We propose a trajectory-aware framework that integrates best-arm...

arXiv CS 9d ago

Buzz, Choose, Forget: A Meta-Bandit Framework for Bee-Like Decision Making

arXiv:2510.16462v3 Announce Type: replace Abstract: This work introduces MAYA, a sequential imitation learning model based on multi-armed bandits, designed to reproduce and predict individual bees' decisions in contextualized foraging tasks. The model accounts for bees' limited memory through a temporal window $\tau$, whose optimal value is around 7 trials, with a slight dependence on weather conditions. Experimental results on real, simulated, and complementary (mice) datasets show that...

arXiv CS 7d ago