Diffusion Policies for Out-of-Distribution Generalization in Offline Reinforcement Learning

07/10/2023
by   Suzan Ece Ada, et al.
1

Offline Reinforcement Learning (RL) methods leverage previous experiences to learn better policies than the behavior policy used for data collection. In contrast to behavior cloning, which assumes the data is collected from expert demonstrations, offline RL can work with non-expert data and multimodal behavior policies. However, offline RL algorithms face challenges in handling distribution shifts and effectively representing policies due to the lack of online interaction during training. Prior work on offline RL uses conditional diffusion models to represent multimodal behavior in the dataset. Nevertheless, these methods are not tailored toward alleviating the out-of-distribution state generalization. We introduce a novel method named State Reconstruction for Diffusion Policies (SRDP), incorporating state reconstruction feature learning in the recent class of diffusion policies to address the out-of-distribution generalization problem. State reconstruction loss promotes generalizable representation learning of states to alleviate the distribution shift incurred by the out-of-distribution (OOD) states. We design a novel 2D Multimodal Contextual Bandit environment to illustrate the OOD generalization and faster convergence of SRDP compared to prior algorithms. In addition, we assess the performance of our model on D4RL continuous control benchmarks, namely the navigation of an 8-DoF ant and forward locomotion of half-cheetah, hopper, and walker2d, achieving state-of-the-art results.

READ FULL TEXT

page 6

page 7

page 14

research
08/12/2022

Diffusion Policies as an Expressive Policy Class for Offline Reinforcement Learning

Offline reinforcement learning (RL), which aims to learn an optimal poli...
research
12/01/2022

Launchpad: Learning to Schedule Using Offline and Online RL Methods

Deep reinforcement learning algorithms have succeeded in several challen...
research
05/31/2023

Efficient Diffusion Policies for Offline Reinforcement Learning

Offline reinforcement learning (RL) aims to learn optimal policies from ...
research
07/08/2021

Offline Meta-Reinforcement Learning with Online Self-Supervision

Meta-reinforcement learning (RL) can meta-train policies that adapt to n...
research
04/08/2023

PDViz: a Visual Analytics Approach for State Policy Data

Sub-national governments across the United States implement a variety of...
research
11/29/2022

Behavior Estimation from Multi-Source Data for Offline Reinforcement Learning

Offline reinforcement learning (RL) have received rising interest due to...
research
09/12/2023

Reasoning with Latent Diffusion in Offline Reinforcement Learning

Offline reinforcement learning (RL) holds promise as a means to learn hi...

Please sign up or login with your details

Forgot password? Click here to reset