Markov Decision Processes
No mentions found
This entity hasn't been tracked yet, or Iris is still building its knowledge base.
Related Articles from SNS
Policy Gradient for Continuous-Time Robust Markov Decision Processes
Announce Type: new Abstract: The framework of robust Markov decision processes (RMDPs) allows the design of reinforcement learning agents that satisfy performance guarantees under worst-case transition dynamics. Traditional RMDPs consider discrete-time dynamics and recently, sample-efficient policy gradient algorithms have been considered in this context. This paper investigates policy gradient algorithms within a continuous-time RMDP framework.
Policy Gradient for Continuous-Time Robust Markov Decision Processes
arXiv:2606.04335v2 Announce Type: replace Abstract: The framework of robust Markov decision processes (RMDPs) allows the design of reinforcement learning agents that satisfy performance guarantees under worst-case transition dynamics. Traditional RMDPs consider discrete-time dynamics and recently, sample-efficient policy gradient algorithms have been considered in this context. This paper investigates policy gradient algorithms within a continuous-time RMDP framework.
Randomization for Faster Exact Optimization of Discounted Markov Decision Processes
arXiv:2606.05110v1 Announce Type: new Abstract: We provide faster deterministic and randomized algorithms for exactly solving discounted Markov Decision Processes (DMDPs). We obtain our results by efficiently reducing computing optimal values and policies in DMDPs to the easier tasks of policy evaluation and computing approximately optimal values in DMDPs. We provide both a straightforward deterministic reduction and a more efficient randomized variant that, together with advances in...
Bayesian learning for the stochastic shortest path problem
Announce Type: cross Abstract: Sequential decision-making problems are often modelled as a Markov decision process (MDP). We focus on the stochastic shortest path (SSP) problem, which is an infinite-horizon undiscounted MDP with absorbing terminal states. We develop a Bayesian framework to learn the optimal decision strategy through interactions with the decision-making task.
Strongly Polynomial Time Complexity of Policy Iteration for $L_\infty$ Robust MDPs
arXiv:2601.23229v2 Announce Type: replace Abstract: Markov decision processes (MDPs) are a fundamental model in sequential decision making. Robust MDPs (RMDPs) extend this framework by allowing uncertainty in transition probabilities and optimizing against the worst-case realization of that uncertainty. In particular, $(s, a)$-rectangular RMDPs with $L_\infty$ uncertainty sets form a fundamental and expressive model: they subsume classical MDPs and turn-based stochastic games.
Experience-Driven Dynamic Exits for LLMs with Reinforcement Learning
arXiv:2606.03113v1 Announce Type: new Abstract: Large Language Models suffer from slow autoregressive inference. While self-speculative decoding accelerates this process, its efficiency is hampered by static configurations like fixed exit layers and speculation lengths. We reframe this optimization as a \textbf{Markov Decision Process} and propose \textbf{LEDE}, a framework that uses offline reinforcement learning.
PatchWorld: Gradient-Free Optimization of Executable World Models
arXiv:2605.30880v1 Announce Type: new Abstract: Text-agent environments are typically modeled as partially observable Markov decision processes (POMDPs), assuming that the simulator's latent state and transition dynamics are hidden from the agent. Yet little work has examined whether executable code can be induced to serve as a world model for prediction and planning under partial observability. We introduce PatchWorld, a gradient-free framework that turns offline trajectories into...
Risk-Aware Planning for Transit Desert Remediation Under Demand Uncertainty
Announce Type: new Abstract: Transit deserts are areas where public transportation is inadequate despite evidence of travel demand, a condition that affects tens of millions of residents across the Americas. Planning for these areas is difficult because the usual demand signal is missing: ridership cannot be observed before service exists. To address that setting, we formulate risk-aware transit desert remediation as a partially observable Markov decision process with Conditional...
Vectorized Online POMDP Planning
arXiv:2510.27191v5 Announce Type: replace Abstract: Planning under partial observability is an essential capability of autonomous robots. The Partially Observable Markov Decision Process (POMDP) provides a powerful framework for planning under partial observability problems, capturing the stochastic effects of actions and the limited information available through noisy observations. POMDP solving could benefit tremendously from massive parallelization on today's hardware, but parallelizing...
The Challenges of Using Reinforcement Learning for Controlling Industrial Energy Systems
Announce Type: new Abstract: Reinforcement learning has shown promising results for optimizing the control of industrial energy systems, yet most existing studies remain limited to the application in simulation environments. We investigate the challenges of deploying reinforcement learning in a real-world industrial energy system, considering a thermal heating network as a use case. We formulate the task as a Markov Decision Process and systematically analyze the associated challenges along...