Safe Reinforcement Learning with Linear Function Approximation

06/11/2021
∙
by   Sanae Amani, et al.
∙
0
∙

Safety in reinforcement learning has become increasingly important in recent years. Yet, existing solutions either fail to strictly avoid choosing unsafe actions, which may lead to catastrophic results in safety-critical systems, or fail to provide regret guarantees for settings where safety constraints need to be learned. In this paper, we address both problems by first modeling safety as an unknown linear cost function of states and actions, which must always fall below a certain threshold. We then present algorithms, termed SLUCB-QVI and RSLUCB-QVI, for episodic Markov decision processes (MDPs) with linear function approximation. We show that SLUCB-QVI and RSLUCB-QVI, while with no safety violation, achieve a 𝒊Ėƒ(ι√(d^3H^3T)) regret, nearly matching that of state-of-the-art unsafe algorithms, where H is the duration of each episode, d is the dimension of the feature mapping, Κ is a constant characterizing the safety constraints, and T is the total number of action plays. We further present numerical simulations that corroborate our theoretical findings.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
∙ 02/17/2021

Nearly Optimal Regret for Learning Adversarial MDPs with Linear Function Approximation

We study the reinforcement learning for finite-horizon episodic Markov d...
research
∙ 02/08/2023

A Near-Optimal Algorithm for Safe Reinforcement Learning Under Instantaneous Hard Constraints

In many applications of Reinforcement Learning (RL), it is critically im...
research
∙ 02/13/2023

Provably Safe Reinforcement Learning with Step-wise Violation Constraints

In this paper, we investigate a novel safe reinforcement learning proble...
research
∙ 05/20/2018

A Lyapunov-based Approach to Safe Reinforcement Learning

In many real-world reinforcement learning (RL) problems, besides optimiz...
research
∙ 10/19/2022

Provably Safe Reinforcement Learning via Action Projection using Reachability Analysis and Polynomial Zonotopes

While reinforcement learning produces very promising results for many ap...
research
∙ 12/31/2021

Stochastic convex optimization for provably efficient apprenticeship learning

We consider large-scale Markov decision processes (MDPs) with an unknown...
research
∙ 05/13/2023

Thompson Sampling for Parameterized Markov Decision Processes with Uninformative Actions

We study parameterized MDPs (PMDPs) in which the key parameters of inter...

Please sign up or login with your details

Forgot password? Click here to reset