Extrapolating Beyond Suboptimal Demonstrations via Inverse Reinforcement Learning from Observations

04/12/2019
by   Daniel S. Brown, et al.
0

A critical flaw of existing inverse reinforcement learning (IRL) methods is their inability to significantly outperform the demonstrator. This is a consequence of the general reliance of IRL algorithms upon some form of mimicry, such as feature-count matching, rather than inferring the underlying intentions of the demonstrator that may have been poorly executed in practice. In this paper, we introduce a novel reward learning from observation algorithm, Trajectory-ranked Reward EXtrapolation (T-REX), that extrapolates beyond a set of (approximately) ranked demonstrations in order to infer high-quality reward functions from a set of potentially poor demonstrations. When combined with deep reinforcement learning, we show that this approach can achieve performance that is more than an order of magnitude better than the best-performing demonstration, on multiple Atari and MuJoCo benchmark tasks. In contrast, prior state-of-the-art imitation learning and IRL methods fail to perform better than the demonstrator and often have performance that is orders of magnitude worse than T-REX. Finally, we demonstrate that T-REX is robust to modest amounts of ranking noise and can accurately extrapolate intention by simply watching a learner noisily improve at a task over time.

READ FULL TEXT
research
09/20/2019

Meta-Inverse Reinforcement Learning with Probabilistic Context Variables

Providing a suitable reward function to reinforcement learning can be di...
research
07/09/2019

Ranking-Based Reward Extrapolation without Rankings

The performance of imitation learning is typically upper-bounded by the ...
research
12/04/2020

Demonstration-efficient Inverse Reinforcement Learning in Procedurally Generated Environments

Deep Reinforcement Learning achieves very good results in domains where ...
research
07/14/2021

Deep Adaptive Multi-Intention Inverse Reinforcement Learning

This paper presents a deep Inverse Reinforcement Learning (IRL) framewor...
research
07/06/2022

Inferring and Conveying Intentionality: Beyond Numerical Rewards to Logical Intentions

Shared intentionality is a critical component in developing conscious AI...
research
10/08/2021

Towards Sample-efficient Apprenticeship Learning from Suboptimal Demonstration

Learning from Demonstration (LfD) seeks to democratize robotics by enabl...
research
10/31/2019

Dynamic Cloth Manipulation with Deep Reinforcement Learning

In this paper we present a Deep Reinforcement Learning approach to solve...

Please sign up or login with your details

Forgot password? Click here to reset