Offline Reinforcement Learning via High-Fidelity Generative Behavior Modeling

09/29/2022
by   Huayu Chen, et al.
0

In offline reinforcement learning, weighted regression is a common method to ensure the learned policy stays close to the behavior policy and to prevent selecting out-of-sample actions. In this work, we show that due to the limited distributional expressivity of policy models, previous methods might still select unseen actions during training, which deviates from their initial motivation. To address this problem, we adopt a generative approach by decoupling the learned policy into two parts: an expressive generative behavior model and an action evaluation model. The key insight is that such decoupling avoids learning an explicitly parameterized policy model with a closed-form expression. Directly learning the behavior policy allows us to leverage existing advances in generative modeling, such as diffusion-based methods, to model diverse behaviors. As for action evaluation, we combine our method with an in-sample planning technique to further avoid selecting out-of-sample actions and increase computational efficiency. Experimental results on D4RL datasets show that our proposed method achieves competitive or superior performance compared with state-of-the-art offline RL methods, especially in complex tasks such as AntMaze. We also empirically demonstrate that our method can successfully learn from a heterogeneous dataset containing multiple distinctive but similarly successful strategies, whereas previous unimodal policies fail.

READ FULL TEXT

page 6

page 8

page 13

research
06/01/2022

Know Your Boundaries: The Necessity of Explicit Behavioral Cloning in Offline RL

We introduce an offline reinforcement learning (RL) algorithm that expli...
research
08/12/2022

Diffusion Policies as an Expressive Policy Class for Offline Reinforcement Learning

Offline reinforcement learning (RL), which aims to learn an optimal poli...
research
10/12/2021

Offline Reinforcement Learning with Implicit Q-Learning

Offline reinforcement learning requires reconciling two conflicting aims...
research
06/11/2023

Policy Regularization with Dataset Constraint for Offline Reinforcement Learning

We consider the problem of learning the best possible policy from a fixe...
research
06/09/2023

In-Sample Policy Iteration for Offline Reinforcement Learning

Offline reinforcement learning (RL) seeks to derive an effective control...
research
10/17/2022

Boosting Offline Reinforcement Learning via Data Rebalancing

Offline reinforcement learning (RL) is challenged by the distributional ...
research
07/21/2022

Addressing Optimism Bias in Sequence Modeling for Reinforcement Learning

Impressive results in natural language processing (NLP) based on the Tra...

Please sign up or login with your details

Forgot password? Click here to reset