On the Global Convergence of Risk-Averse Policy Gradient Methods with Dynamic Time-Consistent Risk Measures

01/26/2023
by   Xian Yu, et al.
0

Risk-sensitive reinforcement learning (RL) has become a popular tool to control the risk of uncertain outcomes and ensure reliable performance in various sequential decision-making problems. While policy gradient methods have been developed for risk-sensitive RL, it remains unclear if these methods enjoy the same global convergence guarantees as in the risk-neutral case. In this paper, we consider a class of dynamic time-consistent risk measures, called Expected Conditional Risk Measures (ECRMs), and derive policy gradient updates for ECRM-based objective functions. Under both constrained direct parameterization and unconstrained softmax parameterization, we provide global convergence of the corresponding risk-averse policy gradient algorithms. We further test a risk-averse variant of REINFORCE algorithm on a stochastic Cliffwalk environment to demonstrate the efficacy of our algorithm and the importance of risk control.

READ FULL TEXT

page 36

page 37

page 38

research
07/09/2021

Likelihood ratio-based policy gradient methods for distorted risk measures: A non-asymptotic analysis

We propose policy-gradient algorithms for solving the problem of control...
research
02/13/2015

Policy Gradient for Coherent Risk Measures

Several authors have recently developed risk-sensitive policy gradient m...
research
03/04/2021

On the Convergence and Optimality of Policy Gradient for Markov Coherent Risk

In order to model risk aversion in reinforcement learning, an emerging l...
research
06/14/2019

Epistemic Risk-Sensitive Reinforcement Learning

We develop a framework for interacting with uncertain environments in re...
research
08/22/2019

Practical Risk Measures in Reinforcement Learning

Practical application of Reinforcement Learning (RL) often involves risk...
research
08/19/2022

A Risk-Sensitive Approach to Policy Optimization

Standard deep reinforcement learning (DRL) aims to maximize expected rew...
research
10/03/2020

Policy Gradient with Expected Quadratic Utility Maximization: A New Mean-Variance Approach in Reinforcement Learning

In real-world decision-making problems, risk management is critical. Amo...

Please sign up or login with your details

Forgot password? Click here to reset