DeepAI AI Chat
Log In Sign Up

Ranking-Based Reward Extrapolation without Rankings

by   Daniel S. Brown, et al.

The performance of imitation learning is typically upper-bounded by the performance of the demonstrator. Recent empirical results show that imitation learning via ranked demonstrations allows for better-than-demonstrator performance; however, ranked demonstrations may be difficult to obtain, and little is known theoretically about when such methods can be expected to outperform the demonstrator. To address these issues, we first contribute a sufficient condition for when better-than-demonstrator performance is possible and discuss why ranked demonstrations can contribute to better-than-demonstrator performance. Building on this theory, we then introduce Disturbance-based Reward Extrapolation (D-REX), a ranking-based imitation learning method that injects noise into a policy learned through behavioral cloning to automatically generate ranked demonstrations. By generating rankings automatically, ranking-based imitation learning can be applied in traditional imitation learning settings where only unlabeled demonstrations are available. We empirically validate our approach on standard MuJoCo and Atari benchmarks and show that D-REX can utilize automatic rankings to significantly surpass the performance of the demonstrator and outperform standard imitation learning approaches. D-REX is the first imitation learning approach to achieve significant extrapolation beyond the demonstrator's performance without additional side-information or supervision, such as rewards or human preferences.


page 20

page 21

page 22

page 23

page 24

page 25

page 26


State Alignment-based Imitation Learning

Consider an imitation learning problem that the imitator and the expert ...

A Ranking Game for Imitation Learning

We propose a new framework for imitation learning - treating imitation a...

The MAGICAL Benchmark for Robust Imitation

Imitation Learning (IL) algorithms are typically evaluated in the same e...

Extrapolating Beyond Suboptimal Demonstrations via Inverse Reinforcement Learning from Observations

A critical flaw of existing inverse reinforcement learning (IRL) methods...

A General Language Assistant as a Laboratory for Alignment

Given the broad capabilities of large language models, it should be poss...

Explaining Imitation Learning through Frames

As one of the prevalent methods to achieve automation systems, Imitation...

Evaluating the Effectiveness of Corrective Demonstrations and a Low-Cost Sensor for Dexterous Manipulation

Imitation learning is a promising approach to help robots acquire dexter...