Efficient Trust Region-Based Safe Reinforcement Learning with Low-Bias Distributional Actor-Critic

01/26/2023
by   Dohyeong Kim, et al.
0

To apply reinforcement learning (RL) to real-world applications, agents are required to adhere to the safety guidelines of their respective domains. Safe RL can effectively handle the guidelines by converting them into constraints of the RL problem. In this paper, we develop a safe distributional RL method based on the trust region method, which can satisfy constraints consistently. However, policies may not meet the safety guidelines due to the estimation bias of distributional critics, and importance sampling required for the trust region method can hinder performance due to its significant variance. Hence, we enhance safety performance through the following approaches. First, we train distributional critics to have low estimation biases using proposed target distributions where bias-variance can be traded off. Second, we propose novel surrogates for the trust region method expressed with Q-functions using the reparameterization trick. Additionally, depending on initial policy settings, there can be no policy satisfying constraints within a trust region. To handle this infeasible issue, we propose a gradient integration method which guarantees to find a policy satisfying all constraints from an unsafe initial policy. From extensive experiments, the proposed method with risk-averse constraints shows minimal constraint violations while achieving high returns compared to existing safe RL methods.

READ FULL TEXT
research
02/26/2021

Safe Distributional Reinforcement Learning

Safety in reinforcement learning (RL) is a key property in both training...
research
04/20/2022

SAAC: Safe Reinforcement Learning as an Adversarial Game of Actor-Critics

Although Reinforcement Learning (RL) is effective for sequential decisio...
research
10/14/2022

Safe Model-Based Reinforcement Learning with an Uncertainty-Aware Reachability Certificate

Safe reinforcement learning (RL) that solves constraint-satisfactory pol...
research
02/03/2023

Distributional constrained reinforcement learning for supply chain optimization

This work studies reinforcement learning (RL) in the context of multi-pe...
research
12/19/2020

Model-Based Actor-Critic with Chance Constraint for Stochastic System

Safety constraints are essential for reinforcement learning (RL) applied...
research
01/28/2022

Towards Safe Reinforcement Learning with a Safety Editor Policy

We consider the safe reinforcement learning (RL) problem of maximizing u...
research
04/07/2021

Risk-Conditioned Distributional Soft Actor-Critic for Risk-Sensitive Navigation

Modern navigation algorithms based on deep reinforcement learning (RL) s...

Please sign up or login with your details

Forgot password? Click here to reset