Safe Exploration in Markov Decision Processes with Time-Variant Safety using Spatio-Temporal Gaussian Process

09/12/2018
by   Akifumi Wachi, et al.
10

In many real-world applications (e.g., planetary exploration, robot navigation), an autonomous agent must be able to explore a space with guaranteed safety. Most safe exploration algorithms in the field of reinforcement learning and robotics have been based on the assumption that the safety features are a priori known and time-invariant. This paper presents a learning algorithm called ST-SafeMDP for exploring Markov decision processes (MDPs) that is based on the assumption that the safety features are a priori unknown and time-variant. In this setting, the agent explores MDPs while constraining the probability of entering unsafe states defined by a safety function being below a threshold. The unknown and time-variant safety values are modeled using a spatio-temporal Gaussian process. However, there remains an issue that an agent may have no viable action in a shrinking true safe space. To address this issue, we formulate a problem maximizing the cumulative number of safe states in the worst case scenario with respect to future observations. The effectiveness of this approach was demonstrated in two simulation settings, including one using real lunar terrain data.

READ FULL TEXT

page 2

page 7

page 12

research
04/01/2019

Efficient and Safe Exploration in Deterministic Markov Decision Processes with Unknown Transition Models

We propose a safe exploration algorithm for deterministic Markov Decisio...
research
06/15/2016

Safe Exploration in Finite Markov Decision Processes with Gaussian Processes

In classical reinforcement learning, when exploring an environment, agen...
research
08/15/2020

Safe Reinforcement Learning in Constrained Markov Decision Processes

Safe reinforcement learning has been a promising approach for optimizing...
research
12/04/2022

Automata Learning meets Shielding

Safety is still one of the major research challenges in reinforcement le...
research
10/30/2019

Safe Exploration for Interactive Machine Learning

In Interactive Machine Learning (IML), we iteratively make decisions and...
research
11/03/2022

Benefits of Monotonicity in Safe Exploration with Gaussian Processes

We consider the problem of sequentially maximising an unknown function o...
research
11/07/2022

Learning Probabilistic Temporal Safety Properties from Examples in Relational Domains

We propose a framework for learning a fragment of probabilistic computat...

Please sign up or login with your details

Forgot password? Click here to reset