Online Shielding for Stochastic Systems

by   Bettina Könighofer, et al.

In this paper, we propose a method to develop trustworthy reinforcement learning systems. To ensure safety especially during exploration, we automatically synthesize a correct-by-construction runtime enforcer, called a shield, that blocks all actions that are unsafe with respect to a temporal logic specification from the agent. Our main contribution is a new synthesis algorithm for computing the shield online. Existing offline shielding approaches compute exhaustively the safety of all states-action combinations ahead-of-time, resulting in huge offline computation times, large memory consumption, and significant delays at run-time due to the look-ups in a huge database. The intuition behind online shielding is to compute during run-time the set of all states that could be reached in the near future. For each of these states, the safety of all available actions is analysed and used for shielding as soon as one of the considered states is reached. Our proposed method is general and can be applied to a wide range of planning problems with stochastic behavior. For our evaluation, we selected a 2-player version of the classical computer game SNAKE. The game requires fast decisions and the multiplayer setting induces a large state space, computationally expensive to analyze exhaustively. The safety objective of collision avoidance is easily transferable to a variety of planning tasks.


page 12

page 13


Online Shielding for Reinforcement Learning

Besides the recent impressive results on reinforcement learning (RL), sa...

Safe Multi-Agent Reinforcement Learning via Shielding

Multi-agent reinforcement learning (MARL) has been increasingly used in ...

Safe Reinforcement Learning via Online Shielding

Reinforcement learning is a promising approach to learning control polic...

Probabilistic Counterexample Guidance for Safer Reinforcement Learning (Extended Version)

Safe exploration aims at addressing the limitations of Reinforcement Lea...

Constrained Exploration and Recovery from Experience Shaping

We consider the problem of reinforcement learning under safety requireme...

Online Strategy Synthesis for Safe and Optimized Control of Steerable Needles

Autonomous systems are often applied in uncertain environments, which re...

Online Synthesis for Runtime Enforcement of Safety in Multi-Agent Systems

A shield is attached to a system to guarantee safety by correcting the s...

Please sign up or login with your details

Forgot password? Click here to reset