Reinforcement Learning of Risk-Constrained Policies in Markov Decision Processes

by   Tomáš Brázdil, et al.

Markov decision processes (MDPs) are the defacto frame-work for sequential decision making in the presence ofstochastic uncertainty. A classical optimization criterion forMDPs is to maximize the expected discounted-sum pay-off, which ignores low probability catastrophic events withhighly negative impact on the system. On the other hand,risk-averse policies require the probability of undesirableevents to be below a given threshold, but they do not accountfor optimization of the expected payoff. We consider MDPswith discounted-sum payoff with failure states which repre-sent catastrophic outcomes. The objective of risk-constrainedplanning is to maximize the expected discounted-sum payoffamong risk-averse policies that ensure the probability to en-counter a failure state is below a desired threshold. Our maincontribution is an efficient risk-constrained planning algo-rithm that combines UCT-like search with a predictor learnedthrough interaction with the MDP (in the style of AlphaZero)and with a risk-constrained action selection via linear pro-gramming. We demonstrate the effectiveness of our approachwith experiments on classical MDPs from the literature, in-cluding benchmarks with an order of 10^6 states.


page 1

page 2

page 3

page 4


Constrained Risk-Averse Markov Decision Processes

We consider the problem of designing policies for Markov decision proces...

Vulcan: A Monte Carlo Algorithm for Large Chance Constrained MDPs with Risk Bounding Functions

Chance Constrained Markov Decision Processes maximize reward subject to ...

Expectation Optimization with Probabilistic Guarantees in POMDPs with Discounted-sum Objectives

Partially-observable Markov decision processes (POMDPs) with discounted-...

Risk-Constrained Reinforcement Learning with Percentile Risk Criteria

In many sequential decision-making problems one is interested in minimiz...

Optimizing Expectation with Guarantees in POMDPs (Technical Report)

A standard objective in partially-observable Markov decision processes (...

Soft-Robust Algorithms for Handling Model Misspecification

In reinforcement learning, robust policies for high-stakes decision-maki...

Transition Tensor Markov Decision Processes: Analyzing Shot Policies in Professional Basketball

In this paper we model basketball plays as episodes from team-specific n...