DeepAI AI Chat
Log In Sign Up

Goal-Conditioned Imitation Learning using Score-based Diffusion Policies

04/05/2023
by   Moritz Reuss, et al.
0

We propose a new policy representation based on score-based diffusion models (SDMs). We apply our new policy representation in the domain of Goal-Conditioned Imitation Learning (GCIL) to learn general-purpose goal-specified policies from large uncurated datasets without rewards. Our new goal-conditioned policy architecture "BEhavior generation with ScOre-based Diffusion Policies" (BESO) leverages a generative, score-based diffusion model as its policy. BESO decouples the learning of the score model from the inference sampling process, and, hence allows for fast sampling strategies to generate goal-specified behavior in just 3 denoising steps, compared to 30+ steps of other diffusion based policies. Furthermore, BESO is highly expressive and can effectively capture multi-modality present in the solution space of the play data. Unlike previous methods such as Latent Plans or C-Bet, BESO does not rely on complex hierarchical policies or additional clustering for effective goal-conditioned behavior learning. Finally, we show how BESO can even be used to learn a goal-independent policy from play-data using classifier-free guidance. To the best of our knowledge this is the first work that a) represents a behavior policy based on such a decoupled SDM b) learns an SDM based policy in the domain of GCIL and c) provides a way to simultaneously learn a goal-dependent and a goal-independent policy from play-data. We evaluate BESO through detailed simulation and show that it consistently outperforms several state-of-the-art goal-conditioned imitation learning methods on challenging benchmarks. We additionally provide extensive ablation studies and experiments to demonstrate the effectiveness of our method for effective goal-conditioned behavior generation.

READ FULL TEXT

page 1

page 3

page 5

10/25/2019

Relay Policy Learning: Solving Long-Horizon Tasks via Imitation and Reinforcement Learning

We present relay policy learning, a method for imitation and reinforceme...
06/28/2018

End-to-End Deep Imitation Learning: Robot Soccer Case Study

In imitation learning, behavior learning is generally done using the fea...
02/15/2020

Universal Value Density Estimation for Imitation Learning and Goal-Conditioned Reinforcement Learning

This work considers two distinct settings: imitation learning and goal-c...
09/20/2023

Prompt, Plan, Perform: LLM-based Humanoid Control via Quantized Imitation Learning

In recent years, reinforcement learning and imitation learning have show...
05/04/2023

CCIL: Context-conditioned imitation learning for urban driving

Imitation learning holds great promise for addressing the complex task o...
03/05/2019

Learning Latent Plans from Play

We propose learning from teleoperated play data (LfP) as a way to scale ...
08/04/2021

Tolerance-Guided Policy Learning for Adaptable and Transferrable Delicate Industrial Insertion

Policy learning for delicate industrial insertion tasks (e.g., PC board ...