Home › Knowledge Base › Upper Confidence Bound

Upper Confidence Bound

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

Minimax Optimal Strategy for Delayed Observations in Online Reinforcement Learning

arXiv:2603.03480v2 Announce Type: replace Abstract: We study reinforcement learning with delayed state observation, where the agent observes the current state after some random number of time steps. We propose an algorithm that combines the augmentation method and the upper confidence bound approach. For tabular Markov decision processes (MDPs), we derive a regret bound of $\tilde{\mathcal{O}}(H \sqrt{D_{\max} SAK})$, where $S$ and $A$ are the cardinalities of the state and action spaces,...

arXiv CS 7d ago

A Joint Finite-Sample Certificate for Adaptive Selective Conformal Risk Control

Announce Type: new Abstract: Selective predictors answer on confident inputs and abstain elsewhere; deploying one safely needs a single finite-sample certificate that simultaneously upper-bounds the selected risk, lower-bounds the acceptance probability $\pacc$ above a floor $\pmin$, and lower-bounds the deployment utility. This certificate must be valid under adaptive threshold selection from a finite grid of $m$ pairs on $\ncert$ samples. We give such a certificate for bounded, possibly...

arXiv CS 1d ago

Multi-Armed Sequential Hypothesis Testing by Betting

arXiv:2603.17925v2 Announce Type: replace-cross Abstract: We consider a variant of sequential testing by betting where, at each time step, the statistician is presented with multiple data sources (arms) and obtains data by choosing one of the arms. We consider the composite global null hypothesis $\mathscr{P}$ that all arms are null in a certain sense (e.g. all dosages of a treatment are ineffective) and we are interested in rejecting $\mathscr{P}$ in favor of a composite alternative...

arXiv CS 5d ago

Scenario Generation for Risk-Aware Reinforcement Learning with Probably Approximately Safe Guarantees

Announce Type: replace Abstract: Guaranteeing safety is critical to the deployment of reinforcement learning (RL) agents in the real-world, especially as policies learned using deep RL may demonstrate susceptibility to transition perturbations that result in unknown or unsafe behaviour. A method of policy verification is to construct probabilistic barrier-certificates by sampling policy trajectories with respect to safety constraints, thereby demarcating known safe behaviour from unknown...

arXiv CS 2d ago

Scenario Generation for Risk-Aware Reinforcement Learning with Probably Approximately Safe Guarantees

arXiv:2606.04812v1 Announce Type: new Abstract: Guaranteeing safety is critical to the deployment of reinforcement learning (RL) agents in the real-world, especially as policies learned using deep RL may demonstrate susceptibility to transition perturbations that result in unknown or unsafe behaviour. A method of policy verification is to construct probabilistic barrier-certificates by sampling policy trajectories with respect to safety constraints, thereby demarcating known safe behaviour...

arXiv CS 6d ago

Training for Technology: Adoption and Productive Use of Generative AI in Legal Analysis

Announce Type: replace Abstract: Can targeted user training unlock the productive potential of generative artificial intelligence in professional settings? We study this question using a randomized experiment in which 164 law students completed an issue-spotting examination under one of three conditions: no GenAI access, optional access to a large language model (LLM), or LLM access with a brief training intervention. Untrained LLM access proved counterproductive: relative to participants...

arXiv CS 2d ago

Autonomous Air-Ground Vehicle Operations Optimization in Hazardous Environments: A Multi-Armed Bandit Approach

arXiv:2508.08217v2 Announce Type: replace Abstract: Hazardous environments such as chemical spills, radiological zones, and bio-contaminated sites pose significant threats to human safety and public infrastructure. Rapid and reliable hazard mitigation in these settings often unsafe for humans, calling for autonomous systems that can adaptively sense and respond to evolving risks. This paper presents a decision-making framework for autonomous vehicle dispatch in hazardous environments with...

arXiv CS 1d ago

MASPOB: Bandit-Based Prompt Optimization for Multi-Agent Systems with Graph Neural Networks

arXiv:2603.02630v2 Announce Type: replace Abstract: Large Language Models (LLMs) have achieved great success in many real-world applications, especially the one serving as the cognitive backbone of Multi-Agent Systems (MAS) to orchestrate complex workflows in practice. Since many deployment scenarios preclude MAS workflow modifications and its performance is highly sensitive to the input prompts, prompt optimization emerges as a more natural approach to improve its performance. However,...

arXiv CS 9d ago

Bayesian Optimization of a Multi-Product Chemical Reactor Using Composite Models and Partial Physics Knowledge

arXiv:2606.08611v1 Announce Type: new Abstract: We study data-driven real-time economic optimization of a multi-product chemical reactor when no reliable first-principles model is available beyond a steady-state energy balance. Instead of learning the economic objective directly as a black-box function, we use a composite formulation in which Gaussian process (GP) models predict physically meaningful outputs, including product concentrations and reactor temperature, while profit is computed...

arXiv CS 1d ago

Robust Restless Multi-Armed Bandit for Data Center Flexibility Services Through Virtual Machine Scheduling

arXiv:2605.19116v2 Announce Type: replace Abstract: Energy demands from data centers have surged and stressed the grid in recent years. Electric grids require balancing supply and demand every second, motivating demand response (reduction) from large loads, including data centers. This can be achieved by rescheduling jobs on a physical machine.

arXiv CS 2d ago