Towards Safe Reinforcement Learning via Constraining Conditional Value-at-Risk

06/09/2022
by   Chengyang Ying, et al.
0

Though deep reinforcement learning (DRL) has obtained substantial success, it may encounter catastrophic failures due to the intrinsic uncertainty of both transition and observation. Most of the existing methods for safe reinforcement learning can only handle transition disturbance or observation disturbance since these two kinds of disturbance affect different parts of the agent; besides, the popular worst-case return may lead to overly pessimistic policies. To address these issues, we first theoretically prove that the performance degradation under transition disturbance and observation disturbance depends on a novel metric of Value Function Range (VFR), which corresponds to the gap in the value function between the best state and the worst state. Based on the analysis, we adopt conditional value-at-risk (CVaR) as an assessment of risk and propose a novel reinforcement learning algorithm of CVaR-Proximal-Policy-Optimization (CPPO) which formalizes the risk-sensitive constrained optimization problem by keeping its CVaR under a given threshold. Experimental results show that CPPO achieves a higher cumulative reward and is more robust against both observation and transition disturbances on a series of continuous control tasks in MuJoCo.

READ FULL TEXT
research
02/20/2023

Safe Deep Reinforcement Learning by Verifying Task-Level Properties

Cost functions are commonly employed in Safe Deep Reinforcement Learning...
research
11/09/2019

Worst Cases Policy Gradients

Recent advances in deep reinforcement learning have demonstrated the cap...
research
10/04/2019

Discounted Reinforcement Learning is Not an Optimization Problem

Discounted reinforcement learning is fundamentally incompatible with fun...
research
06/06/2022

Risk-Sensitive Reinforcement Learning: Iterated CVaR and the Worst Path

In this paper, we study a novel episodic risk-sensitive Reinforcement Le...
research
04/16/2022

Efficient Bayesian Policy Reuse with a Scalable Observation Model in Deep Reinforcement Learning

Bayesian policy reuse (BPR) is a general policy transfer framework for s...
research
06/09/2023

Detecting Adversarial Directions in Deep Reinforcement Learning to Make Robust Decisions

Learning in MDPs with highly complex state representations is currently ...
research
08/24/2023

Extreme Risk Mitigation in Reinforcement Learning using Extreme Value Theory

Risk-sensitive reinforcement learning (RL) has garnered significant atte...

Please sign up or login with your details

Forgot password? Click here to reset