SAFER: Data-Efficient and Safe Reinforcement Learning via Skill Acquisition

02/10/2022
by   Dylan Slack, et al.
7

Though many reinforcement learning (RL) problems involve learning policies in settings with difficult-to-specify safety constraints and sparse rewards, current methods struggle to acquire successful and safe policies. Methods that extract useful policy primitives from offline datasets using generative modeling have recently shown promise at accelerating RL in these more complex settings. However, we discover that current primitive-learning methods may not be well-equipped for safe policy learning and may promote unsafe behavior due to their tendency to ignore data from undesirable behaviors. To overcome these issues, we propose SAFEty skill pRiors (SAFER), an algorithm that accelerates policy learning on complex control tasks under safety constraints. Through principled training on an offline dataset, SAFER learns to extract safe primitive skills. In the inference stage, policies trained with SAFER learn to compose safe skills into successful policies. We theoretically characterize why SAFER can enforce safe policy learning and demonstrate its effectiveness on several complex safety-critical robotic grasping tasks inspired by the game Operation, in which SAFER outperforms baseline methods in learning successful policies and enforcing safety.

READ FULL TEXT

page 8

page 24

research
07/19/2021

Constraints Penalized Q-Learning for Safe Offline Reinforcement Learning

We study the problem of safe offline reinforcement learning (RL), the go...
research
12/14/2022

Safety Correction from Baseline: Towards the Risk-aware Policy in Robotics via Dual-agent Reinforcement Learning

Learning a risk-aware policy is essential but rather challenging in unst...
research
09/18/2023

Guided Online Distillation: Promoting Safe Reinforcement Learning by Offline Demonstration

Safe Reinforcement Learning (RL) aims to find a policy that achieves hig...
research
06/06/2022

Enhancing Safe Exploration Using Safety State Augmentation

Safe exploration is a challenging and important problem in model-free re...
research
11/20/2018

Model Learning for Look-ahead Exploration in Continuous Control

We propose an exploration method that incorporates look-ahead search ove...
research
10/08/2018

Safe-To-Explore State Spaces: Ensuring Safe Exploration in Policy Search with Hierarchical Task Optimization

Policy search reinforcement learning allows robots to acquire skills by ...
research
01/28/2023

SaFormer: A Conditional Sequence Modeling Approach to Offline Safe Reinforcement Learning

Offline safe RL is of great practical relevance for deploying agents in ...

Please sign up or login with your details

Forgot password? Click here to reset