Risk-Averse Reinforcement Learning via Dynamic Time-Consistent Risk Measures

01/14/2023
by   Xian Yu, et al.
0

Traditional reinforcement learning (RL) aims to maximize the expected total reward, while the risk of uncertain outcomes needs to be controlled to ensure reliable performance in a risk-averse setting. In this paper, we consider the problem of maximizing dynamic risk of a sequence of rewards in infinite-horizon Markov Decision Processes (MDPs). We adapt the Expected Conditional Risk Measures (ECRMs) to the infinite-horizon risk-averse MDP and prove its time consistency. Using a convex combination of expectation and conditional value-at-risk (CVaR) as a special one-step conditional risk measure, we reformulate the risk-averse MDP as a risk-neutral counterpart with augmented action space and manipulation on the immediate rewards. We further prove that the related Bellman operator is a contraction mapping, which guarantees the convergence of any value-based RL algorithms. Accordingly, we develop a risk-averse deep Q-learning framework, and our numerical studies based on two simple MDPs show that the risk-averse setting can reduce the variance and enhance robustness of the results.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/25/2021

Lexicographic Optimisation of Conditional Value at Risk and Expected Value for Risk-Averse Planning in MDPs

Planning in Markov decision processes (MDPs) typically optimises the exp...
research
09/07/2015

Risk-Averse Approximate Dynamic Programming with Quantile-Based Risk Measures

In this paper, we consider a finite-horizon Markov decision process (MDP...
research
10/22/2018

Risk-Sensitive Reinforcement Learning: A Constrained Optimization Viewpoint

The classic objective in a reinforcement learning (RL) problem is to fin...
research
09/09/2022

RASR: Risk-Averse Soft-Robust MDPs with EVaR and Entropic Risk

Prior work on safe Reinforcement Learning (RL) has studied risk-aversion...
research
01/29/2023

Conditional generalized quantiles based on expected utility model and equivalent characterization of properties

As a counterpart to the (static) risk measures of generalized quantiles ...
research
11/12/2021

Two steps to risk sensitivity

Distributional reinforcement learning (RL) – in which agents learn about...
research
07/09/2019

Variance-Based Risk Estimations in Markov Processes via Transformation with State Lumping

Variance plays a crucial role in risk-sensitive reinforcement learning, ...

Please sign up or login with your details

Forgot password? Click here to reset