Provably Safe Reinforcement Learning with Step-wise Violation Constraints

02/13/2023
by   Nuoya Xiong, et al.
0

In this paper, we investigate a novel safe reinforcement learning problem with step-wise violation constraints. Our problem differs from existing works in that we consider stricter step-wise violation constraints and do not assume the existence of safe actions, making our formulation more suitable for safety-critical applications which need to ensure safety in all decision steps and may not always possess safe actions, e.g., robot control and autonomous driving. We propose a novel algorithm SUCBVI, which guarantees O(√(ST)) step-wise violation and O(√(H^3SAT)) regret. Lower bounds are provided to validate the optimality in both violation and regret performance with respect to S and T. Moreover, we further study a novel safe reward-free exploration problem with step-wise violation constraints. For this problem, we design an (ε,δ)-PAC algorithm SRF-UCRL, which achieves nearly state-of-the-art sample complexity O((S^2AH^2/ε+H^4SA/ε^2)(log(1/δ)+S)), and guarantees O(√(ST)) violation during the exploration. The experimental results demonstrate the superiority of our algorithms in safety performance, and corroborate our theoretical results.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/08/2023

A Near-Optimal Algorithm for Safe Reinforcement Learning Under Instantaneous Hard Constraints

In many applications of Reinforcement Learning (RL), it is critically im...
research
11/09/2021

Safe Policy Optimization with Local Generalized Linear Function Approximations

Safe exploration is a key to applying reinforcement learning (RL) in saf...
research
09/28/2022

Guiding Safe Exploration with Weakest Preconditions

In reinforcement learning for safety-critical settings, it is often desi...
research
06/11/2021

Safe Reinforcement Learning with Linear Function Approximation

Safety in reinforcement learning has become increasingly important in re...
research
03/20/2020

Interpretable Multi Time-scale Constraints in Model-free Deep Reinforcement Learning for Autonomous Driving

In many real world applications, reinforcement learning agents have to o...
research
10/01/2020

Learning to be safe, in finite time

This paper aims to put forward the concept that learning to take safe ac...
research
11/06/2019

Safe Linear Thompson Sampling

The design and performance analysis of bandit algorithms in the presence...

Please sign up or login with your details

Forgot password? Click here to reset