Safe Reinforcement Learning via Projection on a Safe Set: How to Achieve Optimality?

04/02/2020
by   Sebastien Gros, et al.
0

For all its successes, Reinforcement Learning (RL) still struggles to deliver formal guarantees on the closed-loop behavior of the learned policy. Among other things, guaranteeing the safety of RL with respect to safety-critical systems is a very active research topic. Some recent contributions propose to rely on projections of the inputs delivered by the learned policy into a safe set, ensuring that the system safety is never jeopardized. Unfortunately, it is unclear whether this operation can be performed without disrupting the learning process. This paper addresses this issue. The problem is analysed in the context of Q-learning and policy gradient techniques. We show that the projection approach is generally disruptive in the context of Q-learning though a simple alternative solves the issue, while simple corrections can be used in the context of policy gradient methods in order to ensure that the policy gradients are unbiased. The proposed results extend to safe projections based on robust MPC techniques.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/27/2021

Safe Reinforcement Learning with Chance-constrained Model Predictive Control

Real-world reinforcement learning (RL) problems often demand that agents...
research
12/14/2020

Safe Reinforcement Learning with Stability Safety Guarantees Using Robust MPC

Reinforcement Learning offers tools to optimize policies based on the da...
research
03/06/2023

Safe Reinforcement Learning via Probabilistic Logic Shields

Safe Reinforcement learning (Safe RL) aims at learning optimal policies ...
research
10/27/2020

Learning to be Safe: Deep RL with a Safety Critic

Safety is an essential component for deploying reinforcement learning (R...
research
01/19/2019

Towards Physically Safe Reinforcement Learning under Supervision

This paper addresses the question of how a previously available control ...
research
10/02/2022

Policy Gradients for Probabilistic Constrained Reinforcement Learning

This paper considers the problem of learning safe policies in the contex...
research
07/19/2021

Constrained Policy Gradient Method for Safe and Fast Reinforcement Learning: a Neural Tangent Kernel Based Approach

This paper presents a constrained policy gradient algorithm. We introduc...

Please sign up or login with your details

Forgot password? Click here to reset