DeepAI AI Chat
Log In Sign Up

Strictly Batch Imitation Learning by Energy-based Distribution Matching

by   Daniel Jarrett, et al.

Consider learning a policy purely on the basis of demonstrated behavior—that is, with no access to reinforcement signals, no knowledge of transition dynamics, and no further interaction with the environment. This *strictly batch imitation learning* problem arises wherever live experimentation is costly, such as in healthcare. One solution is simply to retrofit existing algorithms for apprenticeship learning to work in the offline setting. But such an approach bargains heavily on model estimation or off-policy evaluation, and can be indirect and inefficient. We argue that a good solution should be able to explicitly parameterize a policy (i.e. respecting action conditionals), implicitly account for rollout dynamics (i.e. respecting state marginals), and—crucially—operate in an entirely offline fashion. To meet this challenge, we propose a novel technique by *energy-based distribution matching* (EDM): By identifying parameterizations of the (discriminative) model of a policy with the (generative) energy function for state distributions, EDM provides a simple and effective solution that equivalently minimizes a divergence between the occupancy measures of the demonstrator and the imitator. Through experiments with application to control tasks and healthcare settings, we illustrate consistent performance gains over existing algorithms for strictly batch imitation learning.


page 1

page 2

page 3

page 4


SMODICE: Versatile Offline Imitation Learning via State Occupancy Matching

We propose State Matching Offline DIstribution Correction Estimation (SM...

A Critique of Strictly Batch Imitation Learning

Recent work by Jarrett et al. attempts to frame the problem of offline i...

SoftDICE for Imitation Learning: Rethinking Off-policy Distribution Matching

We present SoftDICE, which achieves state-of-the-art performance for imi...

Energy-Based Imitation Learning

We tackle a common scenario in imitation learning (IL), where agents try...

Truly Batch Apprenticeship Learning with Deep Successor Features

We introduce a novel apprenticeship learning algorithm to learn an exper...

Scalable Bayesian Inverse Reinforcement Learning

Bayesian inference over the reward presents an ideal solution to the ill...

Provable Representation Learning for Imitation with Contrastive Fourier Features

In imitation learning, it is common to learn a behavior policy to match ...