DeepAI
Log In Sign Up

DERAIL: Diagnostic Environments for Reward And Imitation Learning

12/02/2020
by   Pedro Freire, et al.
1

The objective of many real-world tasks is complex and difficult to procedurally specify. This makes it necessary to use reward or imitation learning algorithms to infer a reward or policy directly from human data. Existing benchmarks for these algorithms focus on realism, testing in complex environments. Unfortunately, these benchmarks are slow, unreliable and cannot isolate failures. As a complementary approach, we develop a suite of simple diagnostic tasks that test individual facets of algorithm performance in isolation. We evaluate a range of common reward and imitation learning algorithms on our tasks. Our results confirm that algorithm performance is highly sensitive to implementation details. Moreover, in a case-study into a popular preference-based reward learning implementation, we illustrate how the suite can pinpoint design flaws and rapidly evaluate candidate solutions. The environments are available at https://github.com/HumanCompatibleAI/seals .

READ FULL TEXT

page 6

page 17

page 18

page 20

page 21

page 22

page 23

page 24

11/22/2022

imitation: Clean Imitation Learning Implementations

imitation provides open-source implementations of imitation and reward l...
05/25/2021

Hyperparameter Selection for Imitation Learning

We address the issue of tuning hyperparameters (HPs) for imitation learn...
02/02/2022

Imitation Learning by Estimating Expertise of Demonstrators

Many existing imitation learning datasets are collected from multiple de...
12/07/2021

Combining Learning from Human Feedback and Knowledge Engineering to Solve Hierarchical Tasks in Minecraft

Real-world tasks of interest are generally poorly defined by human-reada...
11/01/2020

The MAGICAL Benchmark for Robust Imitation

Imitation Learning (IL) algorithms are typically evaluated in the same e...
07/06/2020

Scaling Imitation Learning in Minecraft

Imitation learning is a powerful family of techniques for learning senso...
04/13/2022

A Study of Causal Confusion in Preference-Based Reward Learning

Learning robot policies via preference-based reward learning is an incre...