Safe Reinforcement Learning via Shielding

08/29/2017
by   Mohammed Alshiekh, et al.
0

Reinforcement learning algorithms discover policies that maximize reward, but do not necessarily guarantee safety during learning or execution phases. We introduce a new approach to learn optimal policies while enforcing properties expressed in temporal logic. To this end, given the temporal logic specification that is to be obeyed by the learning system, we propose to synthesize a reactive system called a shield. The shield is introduced in the traditional learning process in two alternative ways, depending on the location at which the shield is implemented. In the first one, the shield acts each time the learning agent is about to make a decision and provides a list of safe actions. In the second way, the shield is introduced after the learning agent. The shield monitors the actions from the learner and corrects them only if the chosen action causes a violation of the specification. We discuss which requirements a shield must meet to preserve the convergence guarantees of the learner. Finally, we demonstrate the versatility of our approach on several challenging reinforcement learning scenarios.

READ FULL TEXT
research
03/06/2023

Safe Reinforcement Learning via Probabilistic Logic Shields

Safe Reinforcement learning (Safe RL) aims at learning optimal policies ...
research
02/26/2020

Cautious Reinforcement Learning with Logical Constraints

This paper presents the concept of an adaptive safe padding that forces ...
research
04/13/2023

Model-based Dynamic Shielding for Safe and Efficient Multi-Agent Reinforcement Learning

Multi-Agent Reinforcement Learning (MARL) discovers policies that maximi...
research
04/15/2019

Reinforcement Learning with Probabilistic Guarantees for Autonomous Driving

Designing reliable decision strategies for autonomous urban driving is c...
research
03/11/2021

Symbolic Reinforcement Learning for Safe RAN Control

In this paper, we demonstrate a Symbolic Reinforcement Learning (SRL) ar...
research
09/24/2018

Better Safe than Sorry: Evidence Accumulation Allows for Safe Reinforcement Learning

In the real world, agents often have to operate in situations with incom...
research
06/30/2020

It's Time to Play Safe: Shield Synthesis for Timed Systems

Erroneous behaviour in safety critical real-time systems may inflict ser...

Please sign up or login with your details

Forgot password? Click here to reset