Toward the Fundamental Limits of Imitation Learning

09/13/2020
โˆ™
by   Nived Rajaraman, et al.
โˆ™
7
โˆ™

Imitation learning (IL) aims to mimic the behavior of an expert policy in a sequential decision-making problem given only demonstrations. In this paper, we focus on understanding the minimax statistical limits of IL in episodic Markov Decision Processes (MDPs). We first consider the setting where the learner is provided a dataset of N expert trajectories ahead of time, and cannot interact with the MDP. Here, we show that the policy which mimics the expert whenever possible is in expectation โ‰ฒ|๐’ฎ| H^2 log (N)/N suboptimal compared to the value of the expert, even when the expert follows an arbitrary stochastic policy. Here ๐’ฎ is the state space, and H is the length of the episode. Furthermore, we establish a suboptimality lower bound of โ‰ณ |๐’ฎ| H^2 / N which applies even if the expert is constrained to be deterministic, or if the learner is allowed to actively query the expert at visited states while interacting with the MDP for N episodes. To our knowledge, this is the first algorithm with suboptimality having no dependence on the number of actions, under no additional assumptions. We then propose a novel algorithm based on minimum-distance functionals in the setting where the transition model is given and the expert is deterministic. The algorithm is suboptimal by โ‰ฒmin{ H โˆš(|๐’ฎ| / N) ,|๐’ฎ| H^3/2 / N }, showing that knowledge of transition improves the minimax rate by at least a โˆš(H) factor.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
โˆ™ 02/25/2021

Provably Breaking the Quadratic Error Compounding Barrier in Imitation Learning, Optimally

We study the statistical limits of Imitation Learning (IL) in episodic M...
research
โˆ™ 02/27/2020

State-only Imitation with Transition Dynamics Mismatch

Imitation Learning (IL) is a popular paradigm for training agents to ach...
research
โˆ™ 07/24/2023

Contextual Bandits and Imitation Learning via Preference-Based Active Queries

We consider the problem of contextual bandits and imitation learning, wh...
research
โˆ™ 06/19/2021

Nearly Minimax Optimal Adversarial Imitation Learning with Known and Unknown Transitions

This paper is dedicated to designing provably efficient adversarial imit...
research
โˆ™ 06/20/2012

Imitation Learning with a Value-Based Prior

The goal of imitation learning is for an apprentice to learn how to beha...
research
โˆ™ 08/04/2019

A Repairable System Supported by Two Spare Units and Serviced by Two Types of Repairers

We study a one-unit repairable system, supported by two identical spare ...
research
โˆ™ 05/04/2016

A Bayesian Approach to Policy Recognition and State Representation Learning

Learning from demonstration (LfD) is the process of building behavioral ...

Please sign up or login with your details

Forgot password? Click here to reset