Deterministic and Discriminative Imitation (D2-Imitation): Revisiting Adversarial Imitation for Sample Efficiency

12/11/2021
by   Mingfei Sun, et al.
0

Sample efficiency is crucial for imitation learning methods to be applicable in real-world applications. Many studies improve sample efficiency by extending adversarial imitation to be off-policy regardless of the fact that these off-policy extensions could either change the original objective or involve complicated optimization. We revisit the foundation of adversarial imitation and propose an off-policy sample efficient approach that requires no adversarial training or min-max optimization. Our formulation capitalizes on two key insights: (1) the similarity between the Bellman equation and the stationary state-action distribution equation allows us to derive a novel temporal difference (TD) learning approach; and (2) the use of a deterministic policy simplifies the TD learning. Combined, these insights yield a practical algorithm, Deterministic and Discriminative Imitation (D2-Imitation), which operates by first partitioning samples into two replay buffers and then learning a deterministic policy via off-policy reinforcement learning. Our empirical results show that D2-Imitation is effective in achieving good sample efficiency, outperforming several off-policy extension approaches of adversarial imitation on many control tasks.

READ FULL TEXT
research
06/06/2021

SoftDICE for Imitation Learning: Rethinking Off-policy Distribution Matching

We present SoftDICE, which achieves state-of-the-art performance for imi...
research
02/25/2021

Off-Policy Imitation Learning from Observations

Learning from Observations (LfO) is a practical reinforcement learning s...
research
10/16/2019

Learning chordal extensions

A highly influential ingredient of many techniques designed to exploit s...
research
12/10/2019

Imitation Learning via Off-Policy Distribution Matching

When performing imitation learning from expert demonstrations, distribut...
research
06/18/2019

Sample-efficient Adversarial Imitation Learning from Observation

Imitation from observation is the framework of learning tasks by observi...
research
09/06/2018

Sample-Efficient Imitation Learning via Generative Adversarial Nets

Recent work in imitation learning articulate their formulation around th...
research
05/26/2021

Provable Representation Learning for Imitation with Contrastive Fourier Features

In imitation learning, it is common to learn a behavior policy to match ...

Please sign up or login with your details

Forgot password? Click here to reset