Provably Efficient Imitation Learning from Observation Alone

05/27/2019
by   Wen Sun, et al.
2

We study Imitation Learning (IL) from Observations alone (ILFO) in large-scale MDPs. While most IL algorithms rely on an expert to directly provide actions to the learner, in this setting the expert only supplies sequences of observations. We design a new model-free algorithm for ILFO, Forward Adversarial Imitation Learning (FAIL) ,which learns a sequence of time-dependent policies by minimizing an Integral Probability Metric between the observation distributions of the expert policy and the learner. FAIL is the first provably efficient algorithm inILFO setting, which learns a near-optimal policy with a number of samples that is polynomial in all relevant parameters but independent of the number of unique observations. The resulting theory extends the domain of provably sample efficient learning algorithms beyond existing results, which typically only consider tabular reinforcement learning settings or settings that require access to a near-optimal reset distribution. We also demonstrate the efficacy ofFAIL on multiple OpenAI Gym control tasks.

READ FULL TEXT
research
03/08/2019

Dyna-AIL : Adversarial Imitation Learning by Planning

Adversarial methods for imitation learning have been shown to perform we...
research
02/22/2021

Optimism is All You Need: Model-Based Imitation Learning From Observation Alone

This paper studies Imitation Learning from Observations alone (ILFO) whe...
research
11/13/2015

Neuroprosthetic decoder training as imitation learning

Neuroprosthetic brain-computer interfaces function via an algorithm whic...
research
06/12/2022

Case-Based Inverse Reinforcement Learning Using Temporal Coherence

Providing expert trajectories in the context of Imitation Learning is of...
research
03/31/2021

DEALIO: Data-Efficient Adversarial Learning for Imitation from Observation

In imitation learning from observation IfO, a learning agent seeks to im...
research
03/24/2019

Truly Batch Apprenticeship Learning with Deep Successor Features

We introduce a novel apprenticeship learning algorithm to learn an exper...
research
01/30/2018

Learning to Emulate an Expert Projective Cone Scheduler

Projective cone scheduling defines a large class of rate-stabilizing pol...

Please sign up or login with your details

Forgot password? Click here to reset