The MAGICAL Benchmark for Robust Imitation

by   Sam Toyer, et al.

Imitation Learning (IL) algorithms are typically evaluated in the same environment that was used to create demonstrations. This rewards precise reproduction of demonstrations in one particular environment, but provides little information about how robustly an algorithm can generalise the demonstrator's intent to substantially different deployment settings. This paper presents the MAGICAL benchmark suite, which permits systematic evaluation of generalisation by quantifying robustness to different kinds of distribution shift that an IL algorithm is likely to encounter in practice. Using the MAGICAL suite, we confirm that existing IL algorithms overfit significantly to the context in which demonstrations are provided. We also show that standard methods for reducing overfitting are effective at creating narrow perceptual invariances, but are not sufficient to enable transfer to contexts that require substantially different behaviour, which suggests that new approaches will be needed in order to robustly generalise demonstrator intent. Code and data for the MAGICAL suite is available at


page 1

page 2

page 3

page 4


Ranking-Based Reward Extrapolation without Rankings

The performance of imitation learning is typically upper-bounded by the ...

Adversarial Imitation Learning from Incomplete Demonstrations

Imitation learning targets deriving a mapping from states to actions, a....

DERAIL: Diagnostic Environments for Reward And Imitation Learning

The objective of many real-world tasks is complex and difficult to proce...

Learning from Guided Play: A Scheduled Hierarchical Approach for Improving Exploration in Adversarial Imitation Learning

Effective exploration continues to be a significant challenge that preve...

Evaluating the Effectiveness of Corrective Demonstrations and a Low-Cost Sensor for Dexterous Manipulation

Imitation learning is a promising approach to help robots acquire dexter...

Simitate: A Hybrid Imitation Learning Benchmark

We present Simitate --- a hybrid benchmarking suite targeting the evalua...

Incremental learning of high-level concepts by imitation

Nowadays, robots become a companion in everyday life. To be well-accepte...

Code Repositories


The MAGICAL benchmark suite for robust imitation learning (NeurIPS 2020)

view repo