Shielded Decision-Making in MDPs

07/16/2018
by   Nils Jansen, et al.
0

A prominent problem in artificial intelligence and machine learning is the safe exploration of an environment. In particular, reinforcement learning is a well-known technique to determine optimal policies for complicated dynamic systems, but suffers from the fact that such policies may induce harmful behavior. We present the concept of a shield that forces decision-making to provably adhere to safety requirements with high probability. Our method exploits the inherent uncertainties in scenarios given by Markov decision processes. We present a method to compute probabilities of decision making regarding temporal logic constraints. We use that information to realize a shield that---when applied to a reinforcement learning algorithm---ensures (near-)optimal behavior both for the safety constraints and for the actual learning objective. In our experiments, we show on the arcade game PAC-MAN that the learning efficiency increases as the learning needs orders of magnitude fewer episodes. We show tradeoffs between sufficient progress in exploration of the environment and ensuring strict safety.

READ FULL TEXT
research
02/26/2020

Cautious Reinforcement Learning with Logical Constraints

This paper presents the concept of an adaptive safe padding that forces ...
research
05/31/2022

Robust Anytime Learning of Markov Decision Processes

Markov decision processes (MDPs) are formal models commonly used in sequ...
research
10/29/2016

Contextual Decision Processes with Low Bellman Rank are PAC-Learnable

This paper studies systematic exploration for reinforcement learning wit...
research
07/07/2020

Provably Safe PAC-MDP Exploration Using Analogies

A key challenge in applying reinforcement learning to safety-critical do...
research
01/16/2014

Non-Deterministic Policies in Markovian Decision Processes

Markovian processes have long been used to model stochastic environments...
research
12/04/2022

Online Shielding for Reinforcement Learning

Besides the recent impressive results on reinforcement learning (RL), sa...
research
02/13/2020

Verifiable RNN-Based Policies for POMDPs Under Temporal Logic Constraints

Recurrent neural networks (RNNs) have emerged as an effective representa...

Please sign up or login with your details

Forgot password? Click here to reset