Safety-Aware Apprenticeship Learning

10/22/2017
by   Weichao Zhou, et al.
0

Apprenticeship learning (AL) is a class of "learning from demonstrations" techniques where the reward function of a Markov Decision Process (MDP) is unknown to the learning agent and the agent has to derive a good policy by observing an expert's demonstrations. In this paper, we study the problem of how to make AL algorithms inherently safe while still meeting its learning objective. We consider a setting where the unknown reward function is assumed to be a linear combination of a set of state features, and the safety property is specified in Probabilistic Computation Tree Logic (PCTL). By embedding probabilistic model checking inside AL, we propose a novel counterexample-guided approach that can ensure both safety and performance of the learned policy. We demonstrate the effectiveness of our approach on several challenging AL scenarios where safety is essential.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/21/2020

Learn to Exceed: Stereo Inverse Reinforcement Learning with Concurrent Policy Optimization

In this paper, we study the problem of obtaining a control policy that c...
research
12/10/2019

Deep Bayesian Reward Learning from Preferences

Bayesian inverse reinforcement learning (IRL) methods are ideal for safe...
research
11/05/2019

Apprenticeship Learning via Frank-Wolfe

We consider the applications of the Frank-Wolfe (FW) algorithm for Appre...
research
06/15/2023

Residual Q-Learning: Offline and Online Policy Customization without Value

Imitation Learning (IL) is a widely used framework for learning imitativ...
research
05/15/2023

An Offline Time-aware Apprenticeship Learning Framework for Evolving Reward Functions

Apprenticeship learning (AL) is a process of inducing effective decision...
research
10/11/2017

Specification Inference from Demonstrations

Learning from expert demonstrations has received a lot of attention in a...
research
11/07/2022

Learning Probabilistic Temporal Safety Properties from Examples in Relational Domains

We propose a framework for learning a fragment of probabilistic computat...

Please sign up or login with your details

Forgot password? Click here to reset