Imitation Learning from Observations by Minimizing Inverse Dynamics Disagreement

10/10/2019
by   Chao Yang, et al.
5

This paper studies Learning from Observations (LfO) for imitation learning with access to state-only demonstrations. In contrast to Learning from Demonstration (LfD) that involves both action and state supervision, LfO is more practical in leveraging previously inapplicable resources (e.g. videos), yet more challenging due to the incomplete expert guidance. In this paper, we investigate LfO and its difference with LfD in both theoretical and practical perspectives. We first prove that the gap between LfD and LfO actually lies in the disagreement of inverse dynamics models between the imitator and the expert, if following the modeling approach of GAIL. More importantly, the upper bound of this gap is revealed by a negative causal entropy which can be minimized in a model-free way. We term our method as Inverse-Dynamics-Disagreement-Minimization (IDDM) which enhances the conventional LfO method through further bridging the gap to LfD. Considerable empirical results on challenging benchmarks indicate that our method attains consistent improvements over other LfO counterparts.

READ FULL TEXT

page 8

page 21

page 23

research
02/25/2021

Off-Policy Imitation Learning from Observations

Learning from Observations (LfO) is a practical reinforcement learning s...
research
04/07/2020

State-Only Imitation Learning for Dexterous Manipulation

Dexterous manipulation has been a long-standing challenge in robotics. R...
research
12/16/2019

To Follow or not to Follow: Selective Imitation Learning from Observations

Learning from demonstrations is a useful way to transfer a skill from on...
research
06/18/2019

RIDM: Reinforced Inverse Dynamics Modeling for Learning from a Single Observed Demonstration

Imitation learning has long been an approach to alleviate the tractabili...
research
02/22/2021

Optimism is All You Need: Model-Based Imitation Learning From Observation Alone

This paper studies Imitation Learning from Observations alone (ILFO) whe...
research
10/16/2020

On the Guaranteed Almost Equivalence between Imitation Learning from Observation and Demonstration

Imitation learning from observation (LfO) is more preferable than imitat...
research
08/03/2020

Concurrent Training Improves the Performance of Behavioral Cloning from Observation

Learning from demonstration is widely used as an efficient way for robot...

Please sign up or login with your details

Forgot password? Click here to reset