Offline Experience Replay for Continual Offline Reinforcement Learning

05/23/2023
by   Sibo Gai, et al.
0

The capability of continuously learning new skills via a sequence of pre-collected offline datasets is desired for an agent. However, consecutively learning a sequence of offline tasks likely leads to the catastrophic forgetting issue under resource-limited scenarios. In this paper, we formulate a new setting, continual offline reinforcement learning (CORL), where an agent learns a sequence of offline reinforcement learning tasks and pursues good performance on all learned tasks with a small replay buffer without exploring any of the environments of all the sequential tasks. For consistently learning on all sequential tasks, an agent requires acquiring new knowledge and meanwhile preserving old knowledge in an offline manner. To this end, we introduced continual learning algorithms and experimentally found experience replay (ER) to be the most suitable algorithm for the CORL problem. However, we observe that introducing ER into CORL encounters a new distribution shift problem: the mismatch between the experiences in the replay buffer and trajectories from the learned policy. To address such an issue, we propose a new model-based experience selection (MBES) scheme to build the replay buffer, where a transition model is learned to approximate the state distribution. This model is used to bridge the distribution bias between the replay buffer and the learned model by filtering the data from offline data that most closely resembles the learned model for storage. Moreover, in order to enhance the ability on learning new tasks, we retrofit the experience replay method with a new dual behavior cloning (DBC) architecture to avoid the disturbance of behavior-cloning loss on the Q-learning process. In general, we call our algorithm offline experience replay (OER). Extensive experiments demonstrate that our OER method outperforms SOTA baselines in widely-used Mujoco environments.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/28/2018

Experience Replay for Continual Learning

Continual learning is the problem of learning new tasks or knowledge whi...
research
04/16/2020

Continual Reinforcement Learning with Multi-Timescale Replay

In this paper, we propose a multi-timescale replay (MTR) buffer for impr...
research
05/23/2023

Continual Learning with Strong Experience Replay

Continual Learning (CL) aims at incrementally learning new tasks without...
research
08/09/2022

Model-Free Generative Replay for Lifelong Reinforcement Learning: Application to Starcraft-2

One approach to meet the challenges of deep lifelong reinforcement learn...
research
12/31/2021

Revisiting Experience Replay: Continual Learning by Adaptively Tuning Task-wise Relationship

Continual learning requires models to learn new tasks while maintaining ...
research
02/15/2023

Prioritized offline Goal-swapping Experience Replay

In goal-conditioned offline reinforcement learning, an agent learns from...
research
08/22/2022

Prioritizing Samples in Reinforcement Learning with Reducible Loss

Most reinforcement learning algorithms take advantage of an experience r...

Please sign up or login with your details

Forgot password? Click here to reset