Understanding Adversarial Imitation Learning in Small Sample Regime: A Stage-coupled Analysis

08/03/2022
by   Tian Xu, et al.
0

Imitation learning learns a policy from expert trajectories. While the expert data is believed to be crucial for imitation quality, it was found that a kind of imitation learning approach, adversarial imitation learning (AIL), can have exceptional performance. With as little as only one expert trajectory, AIL can match the expert performance even in a long horizon, on tasks such as locomotion control. There are two mysterious points in this phenomenon. First, why can AIL perform well with only a few expert trajectories? Second, why does AIL maintain good performance despite the length of the planning horizon? In this paper, we theoretically explore these two questions. For a total-variation-distance-based AIL (called TV-AIL), our analysis shows a horizon-free imitation gap 𝒪({min{1, √(|𝒮|/N)}) on a class of instances abstracted from locomotion control tasks. Here |𝒮| is the state space size for a tabular Markov decision process, and N is the number of expert trajectories. We emphasize two important features of our bound. First, this bound is meaningful in both small and large sample regimes. Second, this bound suggests that the imitation gap of TV-AIL is at most 1 regardless of the planning horizon. Therefore, this bound can explain the empirical observation. Technically, we leverage the structure of multi-stage policy optimization in TV-AIL and present a new stage-coupled analysis via dynamic programming

READ FULL TEXT
research
08/17/2023

Regularizing Adversarial Imitation Learning Using Causal Invariance

Imitation learning methods are used to infer a policy in a Markov decisi...
research
05/08/2021

RAIL: A modular framework for Reinforcement-learning-based Adversarial Imitation Learning

While Adversarial Imitation Learning (AIL) algorithms have recently led ...
research
04/29/2023

A Coupled Flow Approach to Imitation Learning

In reinforcement learning and imitation learning, an object of central i...
research
06/11/2023

Provably Efficient Adversarial Imitation Learning with Unknown Transitions

Imitation learning (IL) has proven to be an effective method for learnin...
research
05/30/2022

Minimax Optimal Online Imitation Learning via Replay Estimation

Online imitation learning is the problem of how best to mimic expert dem...
research
05/29/2018

Truncated Horizon Policy Search: Combining Reinforcement Learning & Imitation Learning

In this paper, we propose to combine imitation and reinforcement learnin...
research
01/27/2023

Theoretical Analysis of Offline Imitation With Supplementary Dataset

Behavioral cloning (BC) can recover a good policy from abundant expert d...

Please sign up or login with your details

Forgot password? Click here to reset