Improving Safety in Deep Reinforcement Learning using Unsupervised Action Planning

09/29/2021
by   Hao-Lun Hsu, et al.
0

One of the key challenges to deep reinforcement learning (deep RL) is to ensure safety at both training and testing phases. In this work, we propose a novel technique of unsupervised action planning to improve the safety of on-policy reinforcement learning algorithms, such as trust region policy optimization (TRPO) or proximal policy optimization (PPO). We design our safety-aware reinforcement learning by storing all the history of "recovery" actions that rescue the agent from dangerous situations into a separate "safety" buffer and finding the best recovery action when the agent encounters similar states. Because this functionality requires the algorithm to query similar states, we implement the proposed safety mechanism using an unsupervised learning algorithm, k-means clustering. We evaluate the proposed algorithm on six robotic control tasks that cover navigation and manipulation. Our results show that the proposed safety RL algorithm can achieve higher rewards compared with multiple baselines in both discrete and continuous control problems. The supplemental video can be found at: https://youtu.be/AFTeWSohILo.

READ FULL TEXT

page 1

page 5

research
03/06/2019

Safety-Guided Deep Reinforcement Learning via Online Gaussian Process Estimation

An important facet of reinforcement learning (RL) has to do with how the...
research
02/20/2017

Learning to Repeat: Fine Grained Action Repetition for Deep Reinforcement Learning

Reinforcement Learning algorithms can learn complex behavioral patterns ...
research
05/12/2022

Provably Safe Deep Reinforcement Learning for Robotic Manipulation in Human Environments

Deep reinforcement learning (RL) has shown promising results in the moti...
research
10/29/2020

Recovery RL: Safe Reinforcement Learning with Learned Recovery Zones

Safety remains a central obstacle preventing widespread use of RL in the...
research
11/23/2022

Reinforcement learning for traffic signal control in hybrid action space

The prevailing reinforcement-learning-based traffic signal control metho...
research
05/25/2020

Policy Entropy for Out-of-Distribution Classification

One critical prerequisite for the deployment of reinforcement learning s...
research
03/06/2020

Lane-Merging Using Policy-based Reinforcement Learning and Post-Optimization

Many current behavior generation methods struggle to handle real-world t...

Please sign up or login with your details

Forgot password? Click here to reset