Imitation Learning by Estimating Expertise of Demonstrators

02/02/2022
by   Mark Beliaev, et al.
11

Many existing imitation learning datasets are collected from multiple demonstrators, each with different expertise at different parts of the environment. Yet, standard imitation learning algorithms typically treat all demonstrators as homogeneous, regardless of their expertise, absorbing the weaknesses of any suboptimal demonstrators. In this work, we show that unsupervised learning over demonstrator expertise can lead to a consistent boost in the performance of imitation learning algorithms. We develop and optimize a joint model over a learned policy and expertise levels of the demonstrators. This enables our model to learn from the optimal behavior and filter out the suboptimal behavior of each demonstrator. Our model learns a single policy that can outperform even the best demonstrator, and can be used to estimate the expertise of any demonstrator at any state. We illustrate our findings on real-robotic continuous control tasks from Robomimic and discrete environments such as MiniGrid and chess, out-performing competing methods in 21 out of 23 settings, with an average of 7% and up to 60% improvement in terms of the final reward.

READ FULL TEXT

page 6

page 15

research
09/15/2019

VILD: Variational Imitation Learning with Diverse-quality Demonstrations

The goal of imitation learning (IL) is to learn a good policy from high-...
research
07/08/2019

On-Policy Robot Imitation Learning from a Converging Supervisor

Existing on-policy imitation learning algorithms, such as DAgger, assume...
research
06/11/2022

Model-based Offline Imitation Learning with Non-expert Data

Although Behavioral Cloning (BC) in theory suffers compounding errors, i...
research
12/02/2020

DERAIL: Diagnostic Environments for Reward And Imitation Learning

The objective of many real-world tasks is complex and difficult to proce...
research
04/27/2023

Learning Environment for the Air Domain (LEAD)

A substantial part of fighter pilot training is simulation-based and inv...
research
07/08/2021

Imitation by Predicting Observations

Imitation learning enables agents to reuse and adapt the hard-won expert...
research
09/06/2023

The Quiet Eye Phenomenon in Minimally Invasive Surgery

In this paper, we report our discovery of a gaze behavior called Quiet E...

Please sign up or login with your details

Forgot password? Click here to reset