Imitating Complex Trajectories: Bridging Low-Level Stability and High-Level Behavior

07/27/2023
by   Adam Block, et al.
0

We propose a theoretical framework for studying the imitation of stochastic, non-Markovian, potentially multi-modal (i.e. "complex" ) expert demonstrations in nonlinear dynamical systems. Our framework invokes low-level controllers - either learned or implicit in position-command control - to stabilize imitation policies around expert demonstrations. We show that with (a) a suitable low-level stability guarantee and (b) a stochastic continuity property of the learned policy we call "total variation continuity" (TVC), an imitator that accurately estimates actions on the demonstrator's state distribution closely matches the demonstrator's distribution over entire trajectories. We then show that TVC can be ensured with minimal degradation of accuracy by combining a popular data-augmentation regimen with a novel algorithmic trick: adding augmentation noise at execution time. We instantiate our guarantees for policies parameterized by diffusion models and prove that if the learner accurately estimates the score of the (noise-augmented) expert policy, then the distribution of imitator trajectories is close to the demonstrator distribution in a natural optimal transport distance. Our analysis constructs intricate couplings between noise-augmented trajectories, a technique that may be of independent interest. We conclude by empirically validating our algorithmic recommendations.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/30/2022

TaSIL: Taylor Series Imitation Learning

We propose Taylor Series Imitation Learning (TaSIL), a simple augmentati...
research
07/20/2023

On Combining Expert Demonstrations in Imitation Learning via Optimal Transport

Imitation learning (IL) seeks to teach agents specific tasks through exp...
research
03/25/2021

Adversarial Imitation Learning with Trajectorial Augmentation and Correction

Deep Imitation Learning requires a large number of expert demonstrations...
research
08/05/2020

Generalization Guarantees for Multi-Modal Imitation Learning

Control policies from imitation learning can often fail to generalize to...
research
10/12/2021

FILM: Following Instructions in Language with Modular Methods

Recent methods for embodied instruction following are typically trained ...
research
12/04/2022

Hierarchical Policy Blending As Optimal Transport

We present hierarchical policy blending as optimal transport (HiPBOT). T...
research
07/28/2020

Learning Stable Manoeuvres in Quadruped Robots from Expert Demonstrations

With the research into development of quadruped robots picking up pace, ...

Please sign up or login with your details

Forgot password? Click here to reset