Off-Policy Imitation Learning from Observations

02/25/2021
by   Zhuangdi Zhu, et al.
0

Learning from Observations (LfO) is a practical reinforcement learning scenario from which many applications can benefit through the reuse of incomplete resources. Compared to conventional imitation learning (IL), LfO is more challenging because of the lack of expert action guidance. In both conventional IL and LfO, distribution matching is at the heart of their foundation. Traditional distribution matching approaches are sample-costly which depend on on-policy transitions for policy learning. Towards sample-efficiency, some off-policy solutions have been proposed, which, however, either lack comprehensive theoretical justifications or depend on the guidance of expert actions. In this work, we propose a sample-efficient LfO approach that enables off-policy optimization in a principled manner. To further accelerate the learning procedure, we regulate the policy update with an inverse action model, which assists distribution matching from the perspective of mode-covering. Extensive empirical results on challenging locomotion tasks indicate that our approach is comparable with state-of-the-art in terms of both sample-efficiency and asymptotic performance.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/16/2023

Sample-Efficient On-Policy Imitation Learning from Observations

Imitation learning from demonstrations (ILD) aims to alleviate numerous ...
research
10/10/2019

Imitation Learning from Observations by Minimizing Inverse Dynamics Disagreement

This paper studies Learning from Observations (LfO) for imitation learni...
research
12/10/2019

Imitation Learning via Off-Policy Distribution Matching

When performing imitation learning from expert demonstrations, distribut...
research
12/11/2021

Deterministic and Discriminative Imitation (D2-Imitation): Revisiting Adversarial Imitation for Sample Efficiency

Sample efficiency is crucial for imitation learning methods to be applic...
research
04/20/2020

Energy-Based Imitation Learning

We tackle a common scenario in imitation learning (IL), where agents try...
research
07/01/2020

Fighting Failures with FIRE: Failure Identification to Reduce Expert Burden in Intervention-Based Learning

Supervised imitation learning, also known as behavior cloning, suffers f...
research
01/30/2018

Learning to Emulate an Expert Projective Cone Scheduler

Projective cone scheduling defines a large class of rate-stabilizing pol...

Please sign up or login with your details

Forgot password? Click here to reset