Learning from Imperfect Demonstrations via Adversarial Confidence Transfer

02/07/2022
by   Zhangjie Cao, et al.
2

Existing learning from demonstration algorithms usually assume access to expert demonstrations. However, this assumption is limiting in many real-world applications since the collected demonstrations may be suboptimal or even consist of failure cases. We therefore study the problem of learning from imperfect demonstrations by learning a confidence predictor. Specifically, we rely on demonstrations along with their confidence values from a different correspondent environment (source environment) to learn a confidence predictor for the environment we aim to learn a policy in (target environment – where we only have unlabeled demonstrations.) We learn a common latent space through adversarial distribution matching of multi-length partial trajectories to enable the transfer of confidence across source and target environments. The learned confidence reweights the demonstrations to enable learning more from informative demonstrations and discarding the irrelevant ones. Our experiments in three simulated environments and a real robot reaching task demonstrate that our approach learns a policy with the highest expected return.

READ FULL TEXT

page 1

page 5

page 6

research
01/27/2019

Imitation Learning from Imperfect Demonstration

Imitation learning (IL) aims to learn an optimal policy from demonstrati...
research
10/28/2021

Learning Feasibility to Imitate Demonstrators with Different Dynamics

The goal of learning from demonstrations is to learn a policy for an age...
research
10/27/2021

Confidence-Aware Imitation Learning from Demonstrations with Varying Optimality

Most existing imitation learning approaches assume the demonstrations ar...
research
01/04/2021

Robust Maximum Entropy Behavior Cloning

Imitation learning (IL) algorithms use expert demonstrations to learn a ...
research
06/10/2020

Bayesian Experience Reuse for Learning from Multiple Demonstrators

Learning from demonstrations (LfD) improves the exploration efficiency o...
research
06/14/2020

Reinforcement Learning with Supervision from Noisy Demonstrations

Reinforcement learning has achieved great success in various application...
research
12/18/2019

Hierarchical Deep Q-Network with Forgetting from Imperfect Demonstrations in Minecraft

We present hierarchical Deep Q-Network with Forgetting (HDQF) that took ...

Please sign up or login with your details

Forgot password? Click here to reset