Provable Safe Reinforcement Learning with Binary Feedback

10/26/2022
by   Andrew Bennett, et al.
0

Safety is a crucial necessity in many applications of reinforcement learning (RL), whether robotic, automotive, or medical. Many existing approaches to safe RL rely on receiving numeric safety feedback, but in many cases this feedback can only take binary values; that is, whether an action in a given state is safe or unsafe. This is particularly true when feedback comes from human experts. We therefore consider the problem of provable safe RL when given access to an offline oracle providing binary feedback on the safety of state, action pairs. We provide a novel meta algorithm, SABRE, which can be applied to any MDP setting given access to a blackbox PAC RL algorithm for that setting. SABRE applies concepts from active learning to reinforcement learning to provably control the number of queries to the safety oracle. SABRE works by iteratively exploring the state space to find regions where the agent is currently uncertain about safety. Our main theoretical results shows that, under appropriate technical assumptions, SABRE never takes unsafe actions during training, and is guaranteed to return a near-optimal safe policy with high probability. We provide a discussion of how our meta-algorithm may be applied to various settings studied in both theoretical and empirical frameworks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/13/2022

Provably Safe Reinforcement Learning: A Theoretical and Experimental Comparison

Ensuring safety of reinforcement learning (RL) algorithms is crucial for...
research
02/26/2021

Safe Distributional Reinforcement Learning

Safety in reinforcement learning (RL) is a key property in both training...
research
02/14/2022

SAUTE RL: Almost Surely Safe Reinforcement Learning Using State Augmentation

Satisfying safety constraints almost surely (or with probability one) ca...
research
12/04/2022

Online Shielding for Reinforcement Learning

Besides the recent impressive results on reinforcement learning (RL), sa...
research
07/07/2020

Provably Safe PAC-MDP Exploration Using Analogies

A key challenge in applying reinforcement learning to safety-critical do...
research
05/23/2018

Discovering Blind Spots in Reinforcement Learning

Agents trained in simulation may make errors in the real world due to mi...
research
07/11/2019

Safe Policy Improvement with Soft Baseline Bootstrapping

Batch Reinforcement Learning (Batch RL) consists in training a policy us...

Please sign up or login with your details

Forgot password? Click here to reset