Goal-Conditioned Imitation Learning using Score-based Diffusion Policies

04/05/2023
by   Moritz Reuss, et al.
0

We propose a new policy representation based on score-based diffusion models (SDMs). We apply our new policy representation in the domain of Goal-Conditioned Imitation Learning (GCIL) to learn general-purpose goal-specified policies from large uncurated datasets without rewards. Our new goal-conditioned policy architecture "BEhavior generation with ScOre-based Diffusion Policies" (BESO) leverages a generative, score-based diffusion model as its policy. BESO decouples the learning of the score model from the inference sampling process, and, hence allows for fast sampling strategies to generate goal-specified behavior in just 3 denoising steps, compared to 30+ steps of other diffusion based policies. Furthermore, BESO is highly expressive and can effectively capture multi-modality present in the solution space of the play data. Unlike previous methods such as Latent Plans or C-Bet, BESO does not rely on complex hierarchical policies or additional clustering for effective goal-conditioned behavior learning. Finally, we show how BESO can even be used to learn a goal-independent policy from play-data using classifier-free guidance. To the best of our knowledge this is the first work that a) represents a behavior policy based on such a decoupled SDM b) learns an SDM based policy in the domain of GCIL and c) provides a way to simultaneously learn a goal-dependent and a goal-independent policy from play-data. We evaluate BESO through detailed simulation and show that it consistently outperforms several state-of-the-art goal-conditioned imitation learning methods on challenging benchmarks. We additionally provide extensive ablation studies and experiments to demonstrate the effectiveness of our method for effective goal-conditioned behavior generation.

READ FULL TEXT

page 1

page 3

page 5

research
10/25/2019

Relay Policy Learning: Solving Long-Horizon Tasks via Imitation and Reinforcement Learning

We present relay policy learning, a method for imitation and reinforceme...
research
06/28/2018

End-to-End Deep Imitation Learning: Robot Soccer Case Study

In imitation learning, behavior learning is generally done using the fea...
research
02/15/2020

Universal Value Density Estimation for Imitation Learning and Goal-Conditioned Reinforcement Learning

This work considers two distinct settings: imitation learning and goal-c...
research
09/20/2023

Prompt, Plan, Perform: LLM-based Humanoid Control via Quantized Imitation Learning

In recent years, reinforcement learning and imitation learning have show...
research
05/04/2023

CCIL: Context-conditioned imitation learning for urban driving

Imitation learning holds great promise for addressing the complex task o...
research
03/05/2019

Learning Latent Plans from Play

We propose learning from teleoperated play data (LfP) as a way to scale ...
research
08/04/2021

Tolerance-Guided Policy Learning for Adaptable and Transferrable Delicate Industrial Insertion

Policy learning for delicate industrial insertion tasks (e.g., PC board ...

Please sign up or login with your details

Forgot password? Click here to reset