Avoidance Learning Using Observational Reinforcement Learning

09/24/2019
by   David Venuto, et al.
29

Imitation learning seeks to learn an expert policy from sampled demonstrations. However, in the real world, it is often difficult to find a perfect expert and avoiding dangerous behaviors becomes relevant for safety reasons. We present the idea of learning to avoid, an objective opposite to imitation learning in some sense, where an agent learns to avoid a demonstrator policy given an environment. We define avoidance learning as the process of optimizing the agent's reward while avoiding dangerous behaviors given by a demonstrator. In this work we develop a framework of avoidance learning by defining a suitable objective function for these problems which involves the distance of state occupancy distributions of the expert and demonstrator policies. We use density estimates for state occupancy measures and use the aforementioned distance as the reward bonus for avoiding the demonstrator. We validate our theory with experiments using a wide range of partially observable environments. Experimental results show that we are able to improve sample efficiency during training compared to state of the art policy optimization and safety methods.

READ FULL TEXT
research
02/20/2020

Support-weighted Adversarial Imitation Learning

Adversarial Imitation Learning (AIL) is a broad family of imitation lear...
research
05/19/2022

IL-flOw: Imitation Learning from Observation using Normalizing Flows

We present an algorithm for Inverse Reinforcement Learning (IRL) from ex...
research
11/23/2021

Sample Efficient Imitation Learning via Reward Function Trained in Advance

Imitation learning (IL) is a framework that learns to imitate expert beh...
research
11/30/2022

Towards Improving Exploration in Self-Imitation Learning using Intrinsic Motivation

Reinforcement Learning has emerged as a strong alternative to solve opti...
research
02/07/2022

A Ranking Game for Imitation Learning

We propose a new framework for imitation learning - treating imitation a...
research
10/13/2021

On Covariate Shift of Latent Confounders in Imitation and Reinforcement Learning

We consider the problem of using expert data with unobserved confounders...
research
06/18/2019

RadGrad: Active learning with loss gradients

Solving sequential decision prediction problems, including those in imit...

Please sign up or login with your details

Forgot password? Click here to reset