Confidence-Aware Imitation Learning from Demonstrations with Varying Optimality

10/27/2021
by   Songyuan Zhang, et al.
0

Most existing imitation learning approaches assume the demonstrations are drawn from experts who are optimal, but relaxing this assumption enables us to use a wider range of data. Standard imitation learning may learn a suboptimal policy from demonstrations with varying optimality. Prior works use confidence scores or rankings to capture beneficial information from demonstrations with varying optimality, but they suffer from many limitations, e.g., manually annotated confidence scores or high average optimality of demonstrations. In this paper, we propose a general framework to learn from demonstrations with varying optimality that jointly learns the confidence score and a well-performing policy. Our approach, Confidence-Aware Imitation Learning (CAIL) learns a well-performing policy from confidence-reweighted demonstrations, while using an outer loss to track the performance of our model and to learn the confidence. We provide theoretical guarantees on the convergence of CAIL and evaluate its performance in both simulated and real robot experiments. Our results show that CAIL significantly outperforms other imitation learning methods from demonstrations with varying optimality. We further show that even without access to any optimal demonstrations, CAIL can still learn a successful policy, and outperforms prior work.

READ FULL TEXT
research
03/10/2021

Learning from Imperfect Demonstrations from Agents with Varying Dynamics

Imitation learning enables robots to learn from demonstrations. Previous...
research
01/27/2019

Imitation Learning from Imperfect Demonstration

Imitation learning (IL) aims to learn an optimal policy from demonstrati...
research
07/09/2019

Ranking-Based Reward Extrapolation without Rankings

The performance of imitation learning is typically upper-bounded by the ...
research
06/19/2018

Unsupervised Imitation Learning

We introduce a novel method to learn a policy from unsupervised demonstr...
research
02/07/2022

Learning from Imperfect Demonstrations via Adversarial Confidence Transfer

Existing learning from demonstration algorithms usually assume access to...
research
01/03/2023

Explaining Imitation Learning through Frames

As one of the prevalent methods to achieve automation systems, Imitation...
research
11/13/2022

Out-of-Dynamics Imitation Learning from Multimodal Demonstrations

Existing imitation learning works mainly assume that the demonstrator wh...

Please sign up or login with your details

Forgot password? Click here to reset