Ranking-Based Reward Extrapolation without Rankings

07/09/2019
by   Daniel S. Brown, et al.
0

The performance of imitation learning is typically upper-bounded by the performance of the demonstrator. Recent empirical results show that imitation learning via ranked demonstrations allows for better-than-demonstrator performance; however, ranked demonstrations may be difficult to obtain, and little is known theoretically about when such methods can be expected to outperform the demonstrator. To address these issues, we first contribute a sufficient condition for when better-than-demonstrator performance is possible and discuss why ranked demonstrations can contribute to better-than-demonstrator performance. Building on this theory, we then introduce Disturbance-based Reward Extrapolation (D-REX), a ranking-based imitation learning method that injects noise into a policy learned through behavioral cloning to automatically generate ranked demonstrations. By generating rankings automatically, ranking-based imitation learning can be applied in traditional imitation learning settings where only unlabeled demonstrations are available. We empirically validate our approach on standard MuJoCo and Atari benchmarks and show that D-REX can utilize automatic rankings to significantly surpass the performance of the demonstrator and outperform standard imitation learning approaches. D-REX is the first imitation learning approach to achieve significant extrapolation beyond the demonstrator's performance without additional side-information or supervision, such as rewards or human preferences.

READ FULL TEXT

page 20

page 21

page 22

page 23

page 24

page 25

page 26

research
11/21/2019

State Alignment-based Imitation Learning

Consider an imitation learning problem that the imitator and the expert ...
research
02/07/2022

A Ranking Game for Imitation Learning

We propose a new framework for imitation learning - treating imitation a...
research
11/01/2020

The MAGICAL Benchmark for Robust Imitation

Imitation Learning (IL) algorithms are typically evaluated in the same e...
research
04/12/2019

Extrapolating Beyond Suboptimal Demonstrations via Inverse Reinforcement Learning from Observations

A critical flaw of existing inverse reinforcement learning (IRL) methods...
research
01/03/2023

Explaining Imitation Learning through Frames

As one of the prevalent methods to achieve automation systems, Imitation...
research
10/27/2021

Confidence-Aware Imitation Learning from Demonstrations with Varying Optimality

Most existing imitation learning approaches assume the demonstrations ar...
research
04/15/2022

Evaluating the Effectiveness of Corrective Demonstrations and a Low-Cost Sensor for Dexterous Manipulation

Imitation learning is a promising approach to help robots acquire dexter...

Please sign up or login with your details

Forgot password? Click here to reset