Reasoning with Latent Diffusion in Offline Reinforcement Learning

09/12/2023
by   Siddarth Venkatraman, et al.
0

Offline reinforcement learning (RL) holds promise as a means to learn high-reward policies from a static dataset, without the need for further environment interactions. However, a key challenge in offline RL lies in effectively stitching portions of suboptimal trajectories from the static dataset while avoiding extrapolation errors arising due to a lack of support in the dataset. Existing approaches use conservative methods that are tricky to tune and struggle with multi-modal data (as we show) or rely on noisy Monte Carlo return-to-go samples for reward conditioning. In this work, we propose a novel approach that leverages the expressiveness of latent diffusion to model in-support trajectory sequences as compressed latent skills. This facilitates learning a Q-function while avoiding extrapolation error via batch-constraining. The latent space is also expressive and gracefully copes with multi-modal data. We show that the learned temporally-abstract latent space encodes richer task-specific information for offline RL tasks as compared to raw state-actions. This improves credit assignment and facilitates faster reward propagation during Q-learning. Our method demonstrates state-of-the-art performance on the D4RL benchmarks, particularly excelling in long-horizon, sparse-reward tasks.

READ FULL TEXT

page 16

page 17

research
06/23/2023

CLUE: Calibrated Latent Guidance for Offline Reinforcement Learning

Offline reinforcement learning (RL) aims to learn an optimal policy from...
research
06/08/2020

Conservative Q-Learning for Offline Reinforcement Learning

Effectively leveraging large, previously collected datasets in reinforce...
research
10/19/2021

Offline Reinforcement Learning with Value-based Episodic Memory

Offline reinforcement learning (RL) shows promise of applying RL to real...
research
10/11/2022

ConserWeightive Behavioral Cloning for Reliable Offline Reinforcement Learning

The goal of offline reinforcement learning (RL) is to learn near-optimal...
research
12/08/2022

Confidence-Conditioned Value Functions for Offline Reinforcement Learning

Offline reinforcement learning (RL) promises the ability to learn effect...
research
07/10/2023

Diffusion Policies for Out-of-Distribution Generalization in Offline Reinforcement Learning

Offline Reinforcement Learning (RL) methods leverage previous experience...
research
05/18/2023

Bayesian Reparameterization of Reward-Conditioned Reinforcement Learning with Energy-based Models

Recently, reward-conditioned reinforcement learning (RCRL) has gained po...

Please sign up or login with your details

Forgot password? Click here to reset