Provably Safe Reinforcement Learning via Action Projection using Reachability Analysis and Polynomial Zonotopes

10/19/2022
by   Niklas Kochdumper, et al.
0

While reinforcement learning produces very promising results for many applications, its main disadvantage is the lack of safety guarantees, which prevents its use in safety-critical systems. In this work, we address this issue by a safety shield for nonlinear continuous systems that solve reach-avoid tasks. Our safety shield prevents applying potentially unsafe actions from a reinforcement learning agent by projecting the proposed action to the closest safe action. This approach is called action projection and is implemented via mixed-integer optimization. The safety constraints for action projection are obtained by applying parameterized reachability analysis using polynomial zonotopes, which enables to accurately capture the nonlinear effects of the actions on the system. In contrast to other state of the art approaches for action projection, our safety shield can efficiently handle input constraints and dynamic obstacles, eases incorporation of the spatial robot dimensions into the safety constraints, guarantees robust safety despite process noise and measurement errors, and is well suited for high-dimensional systems, as we demonstrate on several challenging benchmark systems.

READ FULL TEXT

page 1

page 9

page 10

page 11

research
11/20/2022

Safe Reinforcement Learning using Data-Driven Predictive Control

Reinforcement learning (RL) algorithms can achieve state-of-the-art perf...
research
05/12/2022

Provably Safe Deep Reinforcement Learning for Robotic Manipulation in Human Environments

Deep reinforcement learning (RL) has shown promising results in the moti...
research
04/15/2022

Safe Reinforcement Learning Using Black-Box Reachability Analysis

Reinforcement learning (RL) is capable of sophisticated motion planning ...
research
10/14/2021

Safety-aware Policy Optimisation for Autonomous Racing

To be viable for safety-critical applications, such as autonomous drivin...
research
04/06/2023

Safe MDP Planning by Learning Temporal Patterns of Undesirable Trajectories and Averting Negative Side Effects

In safe MDP planning, a cost function based on the current state and act...
research
06/11/2021

Safe Reinforcement Learning with Linear Function Approximation

Safety in reinforcement learning has become increasingly important in re...
research
10/25/2021

Online Strategy Synthesis for Safe and Optimized Control of Steerable Needles

Autonomous systems are often applied in uncertain environments, which re...

Please sign up or login with your details

Forgot password? Click here to reset