Behavior Prior Representation learning for Offline Reinforcement Learning

11/02/2022
by   Hongyu Zang, et al.
0

Offline reinforcement learning (RL) struggles in environments with rich and noisy inputs, where the agent only has access to a fixed dataset without environment interactions. Past works have proposed common workarounds based on the pre-training of state representations, followed by policy training. In this work, we introduce a simple, yet effective approach for learning state representations. Our method, Behavior Prior Representation (BPR), learns state representations with an easy-to-integrate objective based on behavior cloning of the dataset: we first learn a state representation by mimicking actions from the dataset, and then train a policy on top of the fixed representation, using any off-the-shelf Offline RL algorithm. Theoretically, we prove that BPR carries out performance guarantees when integrated into algorithms that have either policy improvement guarantees (conservative algorithms) or produce lower bounds of the policy values (pessimistic algorithms). Empirically, we show that BPR combined with existing state-of-the-art Offline RL algorithms leads to significant improvements across several offline control benchmarks.

READ FULL TEXT

page 23

page 27

research
06/12/2021

A Minimalist Approach to Offline Reinforcement Learning

Offline reinforcement learning (RL) defines the task of learning from a ...
research
10/14/2022

Mutual Information Regularized Offline Reinforcement Learning

Offline reinforcement learning (RL) aims at learning an effective policy...
research
09/14/2023

Equivariant Data Augmentation for Generalization in Offline Reinforcement Learning

We present a novel approach to address the challenge of generalization i...
research
07/12/2022

Learning Bellman Complete Representations for Offline Policy Evaluation

We study representation learning for Offline Reinforcement Learning (RL)...
research
03/02/2023

The Ladder in Chaos: A Simple and Effective Improvement to General DRL Algorithms by Policy Path Trimming and Boosting

Knowing the learning dynamics of policy is significant to unveiling the ...
research
09/04/2023

Hundreds Guide Millions: Adaptive Offline Reinforcement Learning with Expert Guidance

Offline reinforcement learning (RL) optimizes the policy on a previously...
research
11/28/2022

Offline Q-Learning on Diverse Multi-Task Data Both Scales And Generalizes

The potential of offline reinforcement learning (RL) is that high-capaci...

Please sign up or login with your details

Forgot password? Click here to reset