Strictly Batch Imitation Learning by Energy-based Distribution Matching

06/25/2020
by   Daniel Jarrett, et al.
12

Consider learning a policy purely on the basis of demonstrated behavior—that is, with no access to reinforcement signals, no knowledge of transition dynamics, and no further interaction with the environment. This *strictly batch imitation learning* problem arises wherever live experimentation is costly, such as in healthcare. One solution is simply to retrofit existing algorithms for apprenticeship learning to work in the offline setting. But such an approach bargains heavily on model estimation or off-policy evaluation, and can be indirect and inefficient. We argue that a good solution should be able to explicitly parameterize a policy (i.e. respecting action conditionals), implicitly account for rollout dynamics (i.e. respecting state marginals), and—crucially—operate in an entirely offline fashion. To meet this challenge, we propose a novel technique by *energy-based distribution matching* (EDM): By identifying parameterizations of the (discriminative) model of a policy with the (generative) energy function for state distributions, EDM provides a simple and effective solution that equivalently minimizes a divergence between the occupancy measures of the demonstrator and the imitator. Through experiments with application to control tasks and healthcare settings, we illustrate consistent performance gains over existing algorithms for strictly batch imitation learning.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/04/2022

SMODICE: Versatile Offline Imitation Learning via State Occupancy Matching

We propose State Matching Offline DIstribution Correction Estimation (SM...
research
10/05/2021

A Critique of Strictly Batch Imitation Learning

Recent work by Jarrett et al. attempts to frame the problem of offline i...
research
06/06/2021

SoftDICE for Imitation Learning: Rethinking Off-policy Distribution Matching

We present SoftDICE, which achieves state-of-the-art performance for imi...
research
04/20/2020

Energy-Based Imitation Learning

We tackle a common scenario in imitation learning (IL), where agents try...
research
03/24/2019

Truly Batch Apprenticeship Learning with Deep Successor Features

We introduce a novel apprenticeship learning algorithm to learn an exper...
research
04/29/2023

A Coupled Flow Approach to Imitation Learning

In reinforcement learning and imitation learning, an object of central i...
research
02/12/2021

Scalable Bayesian Inverse Reinforcement Learning

Bayesian inference over the reward presents an ideal solution to the ill...

Please sign up or login with your details

Forgot password? Click here to reset