Plan Your Target and Learn Your Skills: Transferable State-Only Imitation Learning via Decoupled Policy Optimization

03/04/2022
by   Minghuan Liu, et al.
1

Recent progress in state-only imitation learning extends the scope of applicability of imitation learning to real-world settings by relieving the need for observing expert actions. However, existing solutions only learn to extract a state-to-action mapping policy from the data, without considering how the expert plans to the target. This hinders the ability to leverage demonstrations and limits the flexibility of the policy. In this paper, we introduce Decoupled Policy Optimization (DePO), which explicitly decouples the policy as a high-level state planner and an inverse dynamics model. With embedded decoupled policy gradient and generative adversarial training, DePO enables knowledge transfer to different action spaces or state transition dynamics, and can generalize the planner to out-of-demonstration state regions. Our in-depth experimental analysis shows the effectiveness of DePO on learning a generalized target state planner while achieving the best imitation performance. We demonstrate the appealing usage of DePO for transferring across different tasks by pre-training, and the potential for co-training agents with various skills.

READ FULL TEXT

Authors

page 13

page 14

page 16

11/21/2019

State Alignment-based Imitation Learning

Consider an imitation learning problem that the imitator and the expert ...
08/16/2019

Continuous Relaxation of Symbolic Planner for One-Shot Imitation Learning

We address one-shot imitation learning, where the goal is to execute a p...
08/31/2018

Imitation Learning for Neural Morphological String Transduction

We employ imitation learning to train a neural transition-based string t...
09/29/2020

Learning Skills to Patch Plans Based on Inaccurate Models

Planners using accurate models can be effective for accomplishing manipu...
07/01/2020

Policy Improvement from Multiple Experts

Despite its promise, reinforcement learning's real-world adoption has be...
12/23/2020

Augmenting Policy Learning with Routines Discovered from a Single Demonstration

Humans can abstract prior knowledge from very little data and use it to ...
05/26/2021

What data do we need for training an AV motion planner?

We investigate what grade of sensor data is required for training an imi...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.