Home Knowledge Base POMDP

POMDP

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

Think Fast and Far: Long-Horizon Online POMDP Planning via Rapid State Sampling

Announce Type: new Abstract: Partially Observable Markov Decision Processes (POMDPs) are a general and principled framework for motion planning under uncertainty. Despite tremendous improvement in the scalability of POMDP solvers, long-horizon POMDPs remain difficult to solve. To alleviate the difficulty, this paper proposes a new approximate online POMDP solver, called Reference-Based Online POMDP Planning via Rapid State Space Sampling (ROP-RAS3).

arXiv CS 6d ago

Vectorized Online POMDP Planning

arXiv:2510.27191v5 Announce Type: replace Abstract: Planning under partial observability is an essential capability of autonomous robots. The Partially Observable Markov Decision Process (POMDP) provides a powerful framework for planning under partial observability problems, capturing the stochastic effects of actions and the limited information available through noisy observations. POMDP solving could benefit tremendously from massive parallelization on today's hardware, but parallelizing...

arXiv CS 6d ago

High entropy leads to symmetry-equivariant policies in Dec-POMDPs

arXiv:2511.22581v5 Announce Type: replace Abstract: We prove that in any Dec-POMDP, sufficiently high entropy regularization ensures that the policy gradient flow with tabular softmax parametrization always converges, for any initialization, to the same joint policy, and that this joint policy is equivariant w.r.t. all symmetries of the Dec-POMDP. In particular, policies coming from different initializations will be fully compatible, in that their cross-play returns are equal to their...

arXiv CS 2d ago

Engagement Process: Rethinking the Temporal Interface of Action and Observation

Announce Type: replace Abstract: Task completion in digital and physical environments increasingly involves complex temporal interaction, where actions and observations unfold over different time scales rather than align with fixed observation--action steps. To model such interactions, we propose \emph{Engagement Process} (EP), an interaction formalism that inherits the decision-theoretic structure of POMDPs while making time explicit in the action--observation interface. EP represents...

arXiv CS 1d ago

PatchWorld: Gradient-Free Optimization of Executable World Models

arXiv:2605.30880v1 Announce Type: new Abstract: Text-agent environments are typically modeled as partially observable Markov decision processes (POMDPs), assuming that the simulator's latent state and transition dynamics are hidden from the agent. Yet little work has examined whether executable code can be induced to serve as a world model for prediction and planning under partial observability. We introduce PatchWorld, a gradient-free framework that turns offline trajectories into...

arXiv CS 9d ago

Spatiotemporal Decoding of Explore-Exploit Decisions in the Human Brain

Adaptive behavior requires flexibly shifting between exploiting familiar rewards and exploring novel opportunities. These explore-exploit decisions are implemented via a distributed brain network, anchored in frontopolar cortex (FPC) and ventromedial prefrontal cortex (vmPFC), that computes the total value of a given choice by weighting the immediate value of familiar options against the latent future value of exploration. Capturing the precise temporal dynamics of these neural computations...

bioRxiv 7d ago

NestRL: A Nested Training Regime for Mutual Adaptation in Human-AI Teaming

arXiv:2602.17737v2 Announce Type: replace Abstract: Mutual adaptation is a central challenge in human-AI teaming, as humans naturally adjust their strategies in response to an AI agent's behavior. Existing approaches attempt to approximate human behavior by diversifying training partners; however, these partners are typically static and fail to capture the adaptive nature of human teammates. When agents are trained jointly in standard multi-agent settings, they often converge to opaque...

arXiv CS 8d ago

Adaptive Sensing beyond Non-Adaptive Information Limits: End-to-End Co-Design of Geometry, Policy, and Inference

arXiv:2604.25193v2 Announce Type: replace Abstract: Inverse design has transformed vast physical parameter spaces into a substrate for emergent functionality, raising the tantalizing prospect of relocating intelligence from the digital domain into the physical world itself. Nowhere is this prospect more consequential than in sensing, where the analog-to-digital interface imposes a fundamental bottleneck: information not captured by the hardware is irrevocably lost to any downstream...

arXiv Physics 1d ago