Imitating, Fast and Slow: Robust learning from demonstrations via decision-time planning

04/07/2022
by   Carl Qi, et al.
0

The goal of imitation learning is to mimic expert behavior from demonstrations, without access to an explicit reward signal. A popular class of approach infers the (unknown) reward function via inverse reinforcement learning (IRL) followed by maximizing this reward function via reinforcement learning (RL). The policies learned via these approaches are however very brittle in practice and deteriorate quickly even with small test-time perturbations due to compounding errors. We propose Imitation with Planning at Test-time (IMPLANT), a new meta-algorithm for imitation learning that utilizes decision-time planning to correct for compounding errors of any base imitation policy. In contrast to existing approaches, we retain both the imitation policy and the rewards model at decision-time, thereby benefiting from the learning signal of the two components. Empirically, we demonstrate that IMPLANT significantly outperforms benchmark imitation learning approaches on standard control environments and excels at zero-shot generalization when subject to challenging perturbations in test-time dynamics.

READ FULL TEXT
research
06/02/2023

PAGAR: Imitation Learning with Protagonist Antagonist Guided Adversarial Reward

Imitation learning (IL) algorithms often rely on inverse reinforcement l...
research
10/15/2018

Deep Imitative Models for Flexible Inference, Planning, and Control

Imitation learning provides an appealing framework for autonomous contro...
research
05/06/2022

Diverse Imitation Learning via Self-Organizing Generative Models

Imitation learning is the task of replicating expert policy from demonst...
research
09/23/2020

What is the Reward for Handwriting? – Handwriting Generation by Imitation Learning

Analyzing the handwriting generation process is an important issue and h...
research
09/02/2022

TarGF: Learning Target Gradient Field for Object Rearrangement

Object Rearrangement is to move objects from an initial state to a goal ...
research
07/13/2017

Merge or Not? Learning to Group Faces via Imitation Learning

Given a large number of unlabeled face images, face grouping aims at clu...
research
10/17/2019

Single Episode Policy Transfer in Reinforcement Learning

Transfer and adaptation to new unknown environmental dynamics is a key c...

Please sign up or login with your details

Forgot password? Click here to reset