Safe Exploration in Finite Markov Decision Processes with Gaussian Processes

06/15/2016
by   Matteo Turchetta, et al.
0

In classical reinforcement learning, when exploring an environment, agents accept arbitrary short term loss for long term gain. This is infeasible for safety critical applications, such as robotics, where even a single unsafe action may cause system failure. In this paper, we address the problem of safely exploring finite Markov decision processes (MDP). We define safety in terms of an, a priori unknown, safety constraint that depends on states and actions. We aim to explore the MDP under this constraint, assuming that the unknown function satisfies regularity conditions expressed via a Gaussian process prior. We develop a novel algorithm for this task and prove that it is able to completely explore the safely reachable part of the MDP without violating the safety constraint. To achieve this, it cautiously explores safe states and actions in order to gain statistical confidence about the safety of unvisited state-action pairs from noisy observations collected while navigating the environment. Moreover, the algorithm explicitly considers reachability when exploring the MDP, ensuring that it does not get stuck in any state with no safe way out. We demonstrate our method on digital terrain models for the task of exploring an unknown map with a rover.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/15/2020

Safe Reinforcement Learning in Constrained Markov Decision Processes

Safe reinforcement learning has been a promising approach for optimizing...
research
10/30/2019

Safe Exploration for Interactive Machine Learning

In Interactive Machine Learning (IML), we iteratively make decisions and...
research
09/12/2018

Safe Exploration in Markov Decision Processes with Time-Variant Safety using Spatio-Temporal Gaussian Process

In many real-world applications (e.g., planetary exploration, robot navi...
research
04/01/2019

Efficient and Safe Exploration in Deterministic Markov Decision Processes with Unknown Transition Models

We propose a safe exploration algorithm for deterministic Markov Decisio...
research
12/04/2022

Automata Learning meets Shielding

Safety is still one of the major research challenges in reinforcement le...
research
12/09/2022

Information-Theoretic Safe Exploration with Gaussian Processes

We consider a sequential decision making task where we are not allowed t...
research
10/02/2019

Formal Language Constraints for Markov Decision Processes

In order to satisfy safety conditions, a reinforcement learned (RL) agen...

Please sign up or login with your details

Forgot password? Click here to reset