A Natural Actor-Critic Algorithm with Downside Risk Constraints

07/08/2020
by   Thomas Spooner, et al.
0

Existing work on risk-sensitive reinforcement learning - both for symmetric and downside risk measures - has typically used direct Monte-Carlo estimation of policy gradients. While this approach yields unbiased gradient estimates, it also suffers from high variance and decreased sample efficiency compared to temporal-difference methods. In this paper, we study prediction and control with aversion to downside risk which we gauge by the lower partial moment of the return. We introduce a new Bellman equation that upper bounds the lower partial moment, circumventing its non-linearity. We prove that this proxy for the lower partial moment is a contraction, and provide intuition into the stability of the algorithm by variance decomposition. This allows sample-efficient, on-line estimation of partial moments. For risk-sensitive control, we instantiate Reward Constrained Policy Optimization, a recent actor-critic method for finding constrained policies, with our proxy for the lower partial moment. We extend the method to use natural policy gradients and demonstrate the effectiveness of our approach on three benchmark problems for risk-sensitive reinforcement learning.

READ FULL TEXT

page 4

page 7

research
12/28/2020

Risk-Sensitive Deep RL: Variance-Constrained Actor-Critic Provably Finds Globally Optimal Policy

While deep reinforcement learning has achieved tremendous successes in v...
research
07/15/2022

Deep Hedging: Continuous Reinforcement Learning for Hedging of General Portfolios across Multiple Risk Aversions

We present a method for finding optimal hedging policies for arbitrary i...
research
12/18/2022

Risk-Sensitive Reinforcement Learning with Exponential Criteria

While risk-neutral reinforcement learning has shown experimental success...
research
03/25/2014

Variance-Constrained Actor-Critic Algorithms for Discounted and Average Reward MDPs

In many sequential decision-making problems we may want to manage risk b...
research
07/09/2018

Partial Policy-based Reinforcement Learning for Anatomical Landmark Localization in 3D Medical Images

Deploying the idea of long-term cumulative return, reinforcement learnin...
research
10/14/2019

Actor Critic with Differentially Private Critic

Reinforcement learning algorithms are known to be sample inefficient, an...
research
06/29/2022

Conditionally Elicitable Dynamic Risk Measures for Deep Reinforcement Learning

We propose a novel framework to solve risk-sensitive reinforcement learn...

Please sign up or login with your details

Forgot password? Click here to reset