Efficient Risk-Averse Reinforcement Learning

05/10/2022
by   Ido Greenberg, et al.
9

In risk-averse reinforcement learning (RL), the goal is to optimize some risk measure of the returns. A risk measure often focuses on the worst returns out of the agent's experience. As a result, standard methods for risk-averse RL often ignore high-return strategies. We prove that under certain conditions this inevitably leads to a local-optimum barrier, and propose a soft risk mechanism to bypass it. We also devise a novel Cross Entropy module for risk sampling, which (1) preserves risk aversion despite the soft risk; (2) independently improves sample efficiency. By separating the risk aversion of the sampler and the optimizer, we can sample episodes with poor conditions, yet optimize with respect to successful strategies. We combine these two concepts in CeSoR - Cross-entropy Soft-Risk optimization algorithm - which can be applied on top of any risk-averse policy gradient (PG) method. We demonstrate improved risk aversion in maze navigation, autonomous driving, and resource allocation benchmarks, including in scenarios where standard risk-averse PG completely fails.

READ FULL TEXT

page 2

page 23

page 24

page 26

research
04/30/2020

Distributional Soft Actor Critic for Risk Sensitive Learning

Most of reinforcement learning (RL) algorithms aim at maximizing the exp...
research
01/26/2023

Train Hard, Fight Easy: Robust Meta Reinforcement Learning

A major challenge of reinforcement learning (RL) in real-world applicati...
research
06/20/2020

Entropic Risk Constrained Soft-Robust Policy Optimization

Having a perfect model to compute the optimal policy is often infeasible...
research
07/17/2023

An Alternative to Variance: Gini Deviation for Risk-averse Policy Gradient

Restricting the variance of a policy's return is a popular choice in ris...
research
06/12/2023

Combining Reinforcement Learning and Barrier Functions for Adaptive Risk Management in Portfolio Optimization

Reinforcement learning (RL) based investment strategies have been widely...
research
11/12/2019

An Unethical Optimization Principle

If an artificial intelligence aims to maximise risk-adjusted return, the...
research
03/27/2023

Robust Risk-Aware Option Hedging

The objectives of option hedging/trading extend beyond mere protection a...

Please sign up or login with your details

Forgot password? Click here to reset