DeepAI AI Chat
Log In Sign Up

Vulcan: A Monte Carlo Algorithm for Large Chance Constrained MDPs with Risk Bounding Functions

by   Benjamin J Ayton, et al.

Chance Constrained Markov Decision Processes maximize reward subject to a bounded probability of failure, and have been frequently applied for planning with potentially dangerous outcomes or unknown environments. Solution algorithms have required strong heuristics or have been limited to relatively small problems with up to millions of states, because the optimal action to take from a given state depends on the probability of failure in the rest of the policy, leading to a coupled problem that is difficult to solve. In this paper we examine a generalization of a CCMDP that trades off probability of failure against reward through a functional relationship. We derive a constraint that can be applied to each state history in a policy individually, and which guarantees that the chance constraint will be satisfied. The approach decouples states in the CCMDP, so that large problems can be solved efficiently. We then introduce Vulcan, which uses our constraint in order to apply Monte Carlo Tree Search to CCMDPs. Vulcan can be applied to problems where it is unfeasible to generate the entire state space, and policies must be returned in an anytime manner. We show that Vulcan and its variants run tens to hundreds of times faster than linear programming methods, and over ten times faster than heuristic based methods, all without the need for a heuristic, and returning solutions with a mean suboptimality on the order of a few percent. Finally, we use Vulcan to solve for a chance constrained policy in a CCMDP with over 10^13 states in 3 minutes.


Reinforcement Learning of Risk-Constrained Policies in Markov Decision Processes

Markov decision processes (MDPs) are the defacto frame-work for sequenti...

Factoring Exogenous State for Model-Free Monte Carlo

Policy analysts wish to visualize a range of policies for large simulato...

Policy Gradient Algorithms with Monte-Carlo Tree Search for Non-Markov Decision Processes

Policy gradient (PG) is a reinforcement learning (RL) approach that opti...

Monte Carlo Tree Search for Verifying Reachability in Markov Decision Processes

The maximum reachability probabilities in a Markov decision process can ...

Simple Regret Optimization in Online Planning for Markov Decision Processes

We consider online planning in Markov decision processes (MDPs). In onli...

FluCaP: A Heuristic Search Planner for First-Order MDPs

We present a heuristic search algorithm for solving first-order Markov D...

New inference strategies for solving Markov Decision Processes using reversible jump MCMC

In this paper we build on previous work which uses inferences techniques...