Risk-Averse Offline Reinforcement Learning

02/10/2021
by   Núria Armengol Urpí, et al.
0

Training Reinforcement Learning (RL) agents in high-stakes applications might be too prohibitive due to the risk associated to exploration. Thus, the agent can only use data previously collected by safe policies. While previous work considers optimizing the average performance using offline data, we focus on optimizing a risk-averse criteria, namely the CVaR. In particular, we present the Offline Risk-Averse Actor-Critic (O-RAAC), a model-free RL algorithm that is able to learn risk-averse policies in a fully offline setting. We show that O-RAAC learns policies with higher CVaR than risk-neutral approaches in different robot control tasks. Furthermore, considering risk-averse criteria guarantees distributional robustness of the average performance with respect to particular distribution shifts. We demonstrate empirically that in the presence of natural distribution-shifts, O-RAAC learns policies with good average performance.

READ FULL TEXT
research
07/12/2021

Conservative Offline Distributional Reinforcement Learning

Many reinforcement learning (RL) problems in practice are offline, learn...
research
05/17/2021

Uncertainty Weighted Actor-Critic for Offline Reinforcement Learning

Offline Reinforcement Learning promises to learn effective policies from...
research
01/14/2022

Evaluating the root causes of fatigue and associated risk factors in the Brazilian regular aviation industry

This work evaluates the potential root causes of fatigue using a biomath...
research
12/14/2020

Learning how to approve updates to machine learning algorithms in non-stationary settings

Machine learning algorithms in healthcare have the potential to continua...
research
04/07/2021

Risk-Conditioned Distributional Soft Actor-Critic for Risk-Sensitive Navigation

Modern navigation algorithms based on deep reinforcement learning (RL) s...
research
12/05/2022

Benchmarking Offline Reinforcement Learning Algorithms for E-Commerce Order Fraud Evaluation

Amazon and other e-commerce sites must employ mechanisms to protect thei...
research
07/26/2022

Offline Reinforcement Learning at Multiple Frequencies

Leveraging many sources of offline robot data requires grappling with th...

Please sign up or login with your details

Forgot password? Click here to reset