Voronoi Progressive Widening: Efficient Online Solvers for Continuous Space MDPs and POMDPs with Provably Optimal Components

12/18/2020
by   Michael H. Lim, et al.
0

Markov decision processes (MDPs) and partially observable MDPs (POMDPs) can effectively represent complex real-world decision and control problems. However, continuous space MDPs and POMDPs, i.e. those having continuous state, action and observation spaces, are extremely difficult to solve, and there are few online algorithms with convergence guarantees. This paper introduces Voronoi Progressive Widening (VPW), a general technique to modify tree search algorithms to effectively handle continuous or hybrid action spaces, and proposes and evaluates three continuous space solvers: VOSS, VOWSS, and VOMCPOW. VOSS and VOWSS are theoretical tools based on sparse sampling and Voronoi optimistic optimization designed to justify VPW-based online solvers. While previous algorithms have enjoyed convergence guarantees for problems with continuous state and observation spaces, VOWSS is the first with global convergence guarantees for problems that additionally have continuous action spaces. VOMCPOW is a versatile and efficient VPW-based algorithm that consistently outperforms POMCPOW and BOMCP in several simulation experiments.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/10/2019

Sparse tree search optimality guarantees in POMDPs with continuous observation spaces

Partially observable Markov decision processes (POMDPs) with continuous ...
research
12/23/2022

Online Planning for Constrained POMDPs with Continuous Spaces through Dual Ascent

Rather than augmenting rewards with penalties for undesired behavior, Co...
research
09/18/2017

POMCPOW: An online algorithm for POMDPs with continuous state, action, and observation spaces

Online solvers for partially observable Markov decision processes have b...
research
10/07/2020

Improved POMDP Tree Search Planning with Prioritized Action Branching

Online solvers for partially observable Markov decision processes have d...
research
05/20/2020

MDPs with Unawareness in Robotics

We formalize decision-making problems in robotics and automated control ...
research
10/10/2022

Generalized Optimality Guarantees for Solving Continuous Observation POMDPs through Particle Belief MDP Approximation

Partially observable Markov decision processes (POMDPs) provide a flexib...
research
11/12/2021

Q-Learning for MDPs with General Spaces: Convergence and Near Optimality via Quantization under Weak Continuity

Reinforcement learning algorithms often require finiteness of state and ...

Please sign up or login with your details

Forgot password? Click here to reset