Shielded Reinforcement Learning for Hybrid Systems

08/28/2023
by   Asger Horn Brorholt, et al.
0

Safe and optimal controller synthesis for switched-controlled hybrid systems, which combine differential equations and discrete changes of the system's state, is known to be intricately hard. Reinforcement learning has been leveraged to construct near-optimal controllers, but their behavior is not guaranteed to be safe, even when it is encouraged by reward engineering. One way of imposing safety to a learned controller is to use a shield, which is correct by design. However, obtaining a shield for non-linear and hybrid environments is itself intractable. In this paper, we propose the construction of a shield using the so-called barbaric method, where an approximate finite representation of an underlying partition-based two-player safety game is extracted via systematically picked samples of the true transition function. While hard safety guarantees are out of reach, we experimentally demonstrate strong statistical safety guarantees with a prototype implementation and UPPAAL STRATEGO. Furthermore, we study the impact of the synthesized shield when applied as either a pre-shield (applied before learning a controller) or a post-shield (only applied after learning a controller). We experimentally demonstrate superiority of the pre-shielding approach. We apply our technique on a range of case studies, including two industrial examples, and further study post-optimization of the post-shielding approach.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/29/2019

Learning a Safety Verifiable Adaptive Cruise Controller from Human Driving Data

Imitation learning provides a way to automatically construct a controlle...
research
03/29/2021

A hybrid controller for safe and efficient collision avoidance control

We design and experimentally evaluate a hybrid safe-by-construction coll...
research
09/21/2023

Learning to Recover for Safe Reinforcement Learning

Safety controllers is widely used to achieve safe reinforcement learning...
research
05/14/2020

Probabilistic Guarantees for Safe Deep Reinforcement Learning

Deep reinforcement learning has been successfully applied to many contro...
research
06/30/2020

It's Time to Play Safe: Shield Synthesis for Timed Systems

Erroneous behaviour in safety critical real-time systems may inflict ser...
research
10/06/2021

Adaptive control of a mechatronic system using constrained residual reinforcement learning

We propose a simple, practical and intuitive approach to improve the per...
research
01/29/2018

Safety-aware Adaptive Reinforcement Learning with Applications to Brushbot Navigation

This paper presents a safety-aware learning framework that employs an ad...

Please sign up or login with your details

Forgot password? Click here to reset