Generalization Guarantees for Multi-Modal Imitation Learning

08/05/2020
by   Allen Z. Ren, et al.
0

Control policies from imitation learning can often fail to generalize to novel environments due to imperfect demonstrations or the inability of imitation learning algorithms to accurately infer the expert's policies. In this paper, we present rigorous generalization guarantees for imitation learning by leveraging the Probably Approximately Correct (PAC)-Bayes framework to provide upper bounds on the expected cost of policies in novel environments. We propose a two-stage training method where a latent policy distribution is first embedded with multi-modal expert behavior using a conditional variational autoencoder, and then "fine-tuned" in new training environments to explicitly optimize the generalization bound. We demonstrate strong generalization bounds and their tightness relative to empirical performance in simulation for (i) grasping diverse mugs, (ii) planar pushing with visual feedback, and (iii) vision-based indoor navigation, as well as through hardware experiments for the two manipulation tasks.

READ FULL TEXT

page 2

page 6

page 8

page 16

research
10/13/2017

Burn-In Demonstrations for Multi-Modal Imitation Learning

Recent work on imitation learning has generated policies that reproduce ...
research
05/22/2018

Maximum Causal Tsallis Entropy Imitation Learning

In this paper, we propose a novel maximum causal Tsallis entropy (MCTE) ...
research
02/28/2020

Probably Approximately Correct Vision-Based Planning using Motion Primitives

This paper presents a deep reinforcement learning approach for synthesiz...
research
02/22/2022

Transporters with Visual Foresight for Solving Unseen Rearrangement Tasks

Rearrangement tasks have been identified as a crucial challenge for inte...
research
01/20/2022

Sim-to-Lab-to-Real: Safe Reinforcement Learning with Shielding and Generalization Guarantees

Safety is a critical component of autonomous systems and remains a chall...
research
07/13/2021

Distributionally Robust Policy Learning via Adversarial Environment Generation

Our goal is to train control policies that generalize well to unseen env...
research
07/27/2023

Imitating Complex Trajectories: Bridging Low-Level Stability and High-Level Behavior

We propose a theoretical framework for studying the imitation of stochas...

Please sign up or login with your details

Forgot password? Click here to reset