Offline Prioritized Experience Replay

06/08/2023
by   Yang Yue, et al.
7

Offline reinforcement learning (RL) is challenged by the distributional shift problem. To address this problem, existing works mainly focus on designing sophisticated policy constraints between the learned policy and the behavior policy. However, these constraints are applied equally to well-performing and inferior actions through uniform sampling, which might negatively affect the learned policy. To alleviate this issue, we propose Offline Prioritized Experience Replay (OPER), featuring a class of priority functions designed to prioritize highly-rewarding transitions, making them more frequently visited during training. Through theoretical analysis, we show that this class of priority functions induce an improved behavior policy, and when constrained to this improved policy, a policy-constrained offline RL algorithm is likely to yield a better solution. We develop two practical strategies to obtain priority weights by estimating advantages based on a fitted value network (OPER-A) or utilizing trajectory returns (OPER-R) for quick computation. OPER is a plug-and-play component for offline RL algorithms. As case studies, we evaluate OPER on five different algorithms, including BC, TD3+BC, Onestep RL, CQL, and IQL. Extensive experiments demonstrate that both OPER-A and OPER-R significantly improve the performance for all baseline methods. Codes and priority weights are availiable at https://github.com/sail-sg/OPER.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/17/2022

Boosting Offline Reinforcement Learning via Data Rebalancing

Offline reinforcement learning (RL) is challenged by the distributional ...
research
06/27/2023

Prioritized Trajectory Replay: A Replay Memory for Data-driven Reinforcement Learning

In recent years, data-driven reinforcement learning (RL), also known as ...
research
06/11/2023

Policy Regularization with Dataset Constraint for Offline Reinforcement Learning

We consider the problem of learning the best possible policy from a fixe...
research
05/31/2023

AccMER: Accelerating Multi-Agent Experience Replay with Cache Locality-aware Prioritization

Multi-Agent Experience Replay (MER) is a key component of off-policy rei...
research
05/15/2021

Regret Minimization Experience Replay

Experience replay is widely used in various deep off-policy reinforcemen...
research
02/15/2023

Prioritized offline Goal-swapping Experience Replay

In goal-conditioned offline reinforcement learning, an agent learns from...
research
05/04/2023

Masked Trajectory Models for Prediction, Representation, and Control

We introduce Masked Trajectory Models (MTM) as a generic abstraction for...

Please sign up or login with your details

Forgot password? Click here to reset