Learning from Suboptimal Demonstration via Self-Supervised Reward Regression

10/17/2020
by   Letian Chen, et al.
0

Learning from Demonstration (LfD) seeks to democratize robotics by enabling non-roboticist end-users to teach robots to perform a task by providing a human demonstration. However, modern LfD techniques, such as inverse reinforcement learning (IRL), assume users provide at least stochastically optimal demonstrations. This assumption fails to hold in all but the most isolated, controlled scenarios, reducing the ability to achieve the goal of empowering real end-users. Recent attempts to learn from sub-optimal demonstration leverage pairwise rankings through Preference-based Reinforcement Learning (PbRL) to infer a more optimal policy than the demonstration. However, we show that these approaches make incorrect assumptions and, consequently, suffer from brittle, degraded performance. In this paper, we overcome the limitations of prior work by developing a novel computational technique that infers an idealized reward function from suboptimal demonstration and bootstraps suboptimal demonstrations to synthesize optimality-parameterized training data for training our reward function. We empirically validate we can learn an idealized reward function with ∼0.95 correlation with the ground truth reward versus only ∼ 0.75 for prior work. We can then train policies achieving ∼ 200% improvement over the suboptimal demonstration and ∼ 90% improvement over prior work. Finally, we present a real-world implementation for teaching a robot to hit a topspin shot in table tennis better than user demonstration.

READ FULL TEXT

page 8

page 15

research
10/08/2021

Towards Sample-efficient Apprenticeship Learning from Suboptimal Demonstration

Learning from Demonstration (LfD) seeks to democratize robotics by enabl...
research
02/14/2018

Reinforcement Learning from Imperfect Demonstrations

Robust real-world learning should benefit from both demonstrations and i...
research
03/02/2023

Self-Improving Robots: End-to-End Autonomous Visuomotor Reinforcement Learning

In imitation and reinforcement learning, the cost of human supervision l...
research
10/09/2021

Credit Assignment Safety Learning from Human Demonstrations

A critical need in assistive robotics, such as assistive wheelchairs for...
research
10/07/2021

Improving Robot-Centric Learning from Demonstration via Personalized Embeddings

Learning from demonstration (LfD) techniques seek to enable novice users...
research
08/07/2019

Task-Oriented Optimal Sequencing of Visualization Charts

A chart sequence is used to describe a series of visualization charts ge...
research
06/07/2021

XIRL: Cross-embodiment Inverse Reinforcement Learning

We investigate the visual cross-embodiment imitation setting, in which a...

Please sign up or login with your details

Forgot password? Click here to reset