Reinforcement Learning of Risk-Constrained Policies in Markov Decision Processes

02/27/2020
by   Tomáš Brázdil, et al.
0

Markov decision processes (MDPs) are the defacto frame-work for sequential decision making in the presence ofstochastic uncertainty. A classical optimization criterion forMDPs is to maximize the expected discounted-sum pay-off, which ignores low probability catastrophic events withhighly negative impact on the system. On the other hand,risk-averse policies require the probability of undesirableevents to be below a given threshold, but they do not accountfor optimization of the expected payoff. We consider MDPswith discounted-sum payoff with failure states which repre-sent catastrophic outcomes. The objective of risk-constrainedplanning is to maximize the expected discounted-sum payoffamong risk-averse policies that ensure the probability to en-counter a failure state is below a desired threshold. Our maincontribution is an efficient risk-constrained planning algo-rithm that combines UCT-like search with a predictor learnedthrough interaction with the MDP (in the style of AlphaZero)and with a risk-constrained action selection via linear pro-gramming. We demonstrate the effectiveness of our approachwith experiments on classical MDPs from the literature, in-cluding benchmarks with an order of 10^6 states.

READ FULL TEXT

page 1

page 2

page 3

page 4

12/04/2020

Constrained Risk-Averse Markov Decision Processes

We consider the problem of designing policies for Markov decision proces...
09/04/2018

Vulcan: A Monte Carlo Algorithm for Large Chance Constrained MDPs with Risk Bounding Functions

Chance Constrained Markov Decision Processes maximize reward subject to ...
04/27/2018

Expectation Optimization with Probabilistic Guarantees in POMDPs with Discounted-sum Objectives

Partially-observable Markov decision processes (POMDPs) with discounted-...
12/05/2015

Risk-Constrained Reinforcement Learning with Percentile Risk Criteria

In many sequential decision-making problems one is interested in minimiz...
11/26/2016

Optimizing Expectation with Guarantees in POMDPs (Technical Report)

A standard objective in partially-observable Markov decision processes (...
11/30/2020

Soft-Robust Algorithms for Handling Model Misspecification

In reinforcement learning, robust policies for high-stakes decision-maki...
12/12/2018

Transition Tensor Markov Decision Processes: Analyzing Shot Policies in Professional Basketball

In this paper we model basketball plays as episodes from team-specific n...