A Strong Baseline for Batch Imitation Learning

by   Matthew Smith, et al.

Imitation of expert behaviour is a highly desirable and safe approach to the problem of sequential decision making. We provide an easy-to-implement, novel algorithm for imitation learning under a strict data paradigm, in which the agent must learn solely from data collected a priori. This paradigm allows our algorithm to be used for environments in which safety or cost are of critical concern. Our algorithm requires no additional hyper-parameter tuning beyond any standard batch reinforcement learning (RL) algorithm, making it an ideal baseline for such data-strict regimes. Furthermore, we provide formal sample complexity guarantees for the algorithm in finite Markov Decision Problems. In doing so, we formally demonstrate an unproven claim from Kearns Singh (1998). On the empirical side, our contribution is twofold. First, we develop a practical, robust and principled evaluation protocol for offline RL methods, making use of only the dataset provided for model selection. This stands in contrast to the vast majority of previous works in offline RL, which tune hyperparameters on the evaluation environment, limiting the practical applicability when deployed in new, cost-critical environments. As such, we establish precedent for the development and fair evaluation of offline RL algorithms. Second, we evaluate our own algorithm on challenging continuous control benchmarks, demonstrating its practical applicability and competitiveness with state-of-the-art performance, despite being a simpler algorithm.


page 1

page 2

page 3

page 4


Bridging Offline Reinforcement Learning and Imitation Learning: A Tale of Pessimism

Offline (or batch) reinforcement learning (RL) algorithms seek to learn ...

Curriculum Offline Imitation Learning

Offline reinforcement learning (RL) tasks require the agent to learn fro...

MAHALO: Unifying Offline Reinforcement Learning and Imitation Learning from Observations

We study a new paradigm for sequential decision making, called offline P...

Sequence Modeling is a Robust Contender for Offline Reinforcement Learning

Offline reinforcement learning (RL) allows agents to learn effective, re...

Distilled Thompson Sampling: Practical and Efficient Thompson Sampling via Imitation Learning

Thompson sampling (TS) has emerged as a robust technique for contextual ...

Comparing the Efficacy of Fine-Tuning and Meta-Learning for Few-Shot Policy Imitation

In this paper we explore few-shot imitation learning for control problem...

Interactive Learning from Activity Description

We present a novel interactive learning protocol that enables training r...

Please sign up or login with your details

Forgot password? Click here to reset