Sample Efficient Imitation Learning via Reward Function Trained in Advance

11/23/2021
by   Lihua Zhang, et al.
0

Imitation learning (IL) is a framework that learns to imitate expert behavior from demonstrations. Recently, IL shows promising results on high dimensional and control tasks. However, IL typically suffers from sample inefficiency in terms of environment interaction, which severely limits their application to simulated domains. In industrial applications, learner usually have a high interaction cost, the more interactions with environment, the more damage it causes to the environment and the learner itself. In this article, we make an effort to improve sample efficiency by introducing a novel scheme of inverse reinforcement learning. Our method, which we call Model Reward Function Based Imitation Learning (MRFIL), uses an ensemble dynamic model as a reward function, what is trained with expert demonstrations. The key idea is to provide the agent with an incentive to match the demonstrations over a long horizon, by providing a positive reward upon encountering states in line with the expert demonstration distribution. In addition, we demonstrate the convergence guarantee for new objective function. Experimental results show that our algorithm reaches the competitive performance and significantly reducing the environment interactions compared to IL methods.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/09/2018

Addressing Sample Inefficiency and Reward Bias in Inverse Reinforcement Learning

The Generative Adversarial Imitation Learning (GAIL) framework from Ho &...
research
06/08/2020

Primal Wasserstein Imitation Learning

Imitation Learning (IL) methods seek to match the behavior of an agent w...
research
12/30/2019

A New Framework for Query Efficient Active Imitation Learning

We seek to align agent policy with human expert behavior in a reinforcem...
research
07/16/2021

Visual Adversarial Imitation Learning using Variational Models

Reward function specification, which requires considerable human effort ...
research
07/31/2021

Risk Averse Bayesian Reward Learning for Autonomous Navigation from Human Demonstration

Traditional imitation learning provides a set of methods and algorithms ...
research
09/24/2019

Avoidance Learning Using Observational Reinforcement Learning

Imitation learning seeks to learn an expert policy from sampled demonstr...
research
06/30/2022

Watch and Match: Supercharging Imitation with Regularized Optimal Transport

Imitation learning holds tremendous promise in learning policies efficie...

Please sign up or login with your details

Forgot password? Click here to reset