Vulcan: A Monte Carlo Algorithm for Large Chance Constrained MDPs with Risk Bounding Functions

09/04/2018
by   Benjamin J Ayton, et al.
0

Chance Constrained Markov Decision Processes maximize reward subject to a bounded probability of failure, and have been frequently applied for planning with potentially dangerous outcomes or unknown environments. Solution algorithms have required strong heuristics or have been limited to relatively small problems with up to millions of states, because the optimal action to take from a given state depends on the probability of failure in the rest of the policy, leading to a coupled problem that is difficult to solve. In this paper we examine a generalization of a CCMDP that trades off probability of failure against reward through a functional relationship. We derive a constraint that can be applied to each state history in a policy individually, and which guarantees that the chance constraint will be satisfied. The approach decouples states in the CCMDP, so that large problems can be solved efficiently. We then introduce Vulcan, which uses our constraint in order to apply Monte Carlo Tree Search to CCMDPs. Vulcan can be applied to problems where it is unfeasible to generate the entire state space, and policies must be returned in an anytime manner. We show that Vulcan and its variants run tens to hundreds of times faster than linear programming methods, and over ten times faster than heuristic based methods, all without the need for a heuristic, and returning solutions with a mean suboptimality on the order of a few percent. Finally, we use Vulcan to solve for a chance constrained policy in a CCMDP with over 10^13 states in 3 minutes.

READ FULL TEXT
research
02/27/2020

Reinforcement Learning of Risk-Constrained Policies in Markov Decision Processes

Markov decision processes (MDPs) are the defacto frame-work for sequenti...
research
03/28/2017

Factoring Exogenous State for Model-Free Monte Carlo

Policy analysts wish to visualize a range of policies for large simulato...
research
03/21/2021

Monte Carlo Information-Oriented Planning

In this article, we discuss how to solve information-gathering problems ...
research
09/10/2018

Monte Carlo Tree Search for Verifying Reachability in Markov Decision Processes

The maximum reachability probabilities in a Markov decision process can ...
research
06/15/2012

Simple Regret Optimization in Online Planning for Markov Decision Processes

We consider online planning in Markov decision processes (MDPs). In onli...
research
09/29/2011

FluCaP: A Heuristic Search Planner for First-Order MDPs

We present a heuristic search algorithm for solving first-order Markov D...
research
05/09/2012

New inference strategies for solving Markov Decision Processes using reversible jump MCMC

In this paper we build on previous work which uses inferences techniques...

Please sign up or login with your details

Forgot password? Click here to reset