Towards Safe Reinforcement Learning with a Safety Editor Policy

01/28/2022
by   Haonan Yu, et al.
0

We consider the safe reinforcement learning (RL) problem of maximizing utility while satisfying provided constraints. Since we do not assume any prior knowledge or pre-training of the safety concept, we are interested in asymptotic constraint satisfaction. A popular approach in this line of research is to combine the Lagrangian method with a model-free RL algorithm to adjust the weight of the constraint reward dynamically. It relies on a single policy to handle the conflict between utility and constraint rewards, which is often challenging. Inspired by the safety layer design (Dalal et al., 2018), we propose to separately learn a safety editor policy that transforms potentially unsafe actions output by a utility maximizer policy into safe ones. The safety editor is trained to maximize the constraint reward while minimizing a hinge loss of the utility Q values of actions before and after the edit. On 12 custom Safety Gym (Ray et al., 2019) tasks and 2 safe racing tasks with very harsh constraint thresholds, our approach demonstrates outstanding utility performance while complying with the constraints. Ablation studies reveal that our two-policy design is critical. Simply doubling the model capacity of typical single-policy approaches will not lead to comparable results. The Q hinge loss is also important in certain circumstances, and replacing it with the usual L2 distance could fail badly.

READ FULL TEXT

page 6

page 7

page 13

page 16

page 17

research
06/06/2022

Enhancing Safe Exploration Using Safety State Augmentation

Safe exploration is a challenging and important problem in model-free re...
research
11/25/2021

Learn Zero-Constraint-Violation Policy in Model-Free Constrained Reinforcement Learning

In the trial-and-error mechanism of reinforcement learning (RL), a notor...
research
02/14/2023

Constrained Decision Transformer for Offline Safe Reinforcement Learning

Safe reinforcement learning (RL) trains a constraint satisfaction policy...
research
11/15/2021

Joint Synthesis of Safety Certificate and Safe Control Policy using Constrained Reinforcement Learning

Safety is the major consideration in controlling complex dynamical syste...
research
04/05/2019

Safe Disassociation of Set-Valued Datasets

Disassociation introduced by Terrovitis et al. is a bucketization based ...
research
05/31/2021

Multi-Objective SPIBB: Seldonian Offline Policy Improvement with Safety Constraints in Finite MDPs

We study the problem of Safe Policy Improvement (SPI) under constraints ...
research
01/26/2023

Efficient Trust Region-Based Safe Reinforcement Learning with Low-Bias Distributional Actor-Critic

To apply reinforcement learning (RL) to real-world applications, agents ...

Please sign up or login with your details

Forgot password? Click here to reset