Prioritized Trajectory Replay: A Replay Memory for Data-driven Reinforcement Learning

06/27/2023
by   Jinyi Liu, et al.
0

In recent years, data-driven reinforcement learning (RL), also known as offline RL, have gained significant attention. However, the role of data sampling techniques in offline RL has been overlooked despite its potential to enhance online RL performance. Recent research suggests applying sampling techniques directly to state-transitions does not consistently improve performance in offline RL. Therefore, in this study, we propose a memory technique, (Prioritized) Trajectory Replay (TR/PTR), which extends the sampling perspective to trajectories for more comprehensive information extraction from limited data. TR enhances learning efficiency by backward sampling of trajectories that optimizes the use of subsequent state information. Building on TR, we build the weighted critic target to avoid sampling unseen actions in offline training, and Prioritized Trajectory Replay (PTR) that enables more efficient trajectory sampling, prioritized by various trajectory priority metrics. We demonstrate the benefits of integrating TR and PTR with existing offline RL algorithms on D4RL. In summary, our research emphasizes the significance of trajectory-based data sampling techniques in enhancing the efficiency and performance of offline RL algorithms.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/08/2023

Offline Prioritized Experience Replay

Offline reinforcement learning (RL) is challenged by the distributional ...
research
02/22/2021

Stratified Experience Replay: Correcting Multiplicity Bias in Off-Policy Reinforcement Learning

Deep Reinforcement Learning (RL) methods rely on experience replay to ap...
research
06/22/2023

Harnessing Mixed Offline Reinforcement Learning Datasets via Trajectory Weighting

Most offline reinforcement learning (RL) algorithms return a target poli...
research
04/11/2023

Control invariant set enhanced reinforcement learning for process control: improved sampling efficiency and guaranteed stability

Reinforcement learning (RL) is an area of significant research interest,...
research
12/07/2021

PTR-PPO: Proximal Policy Optimization with Prioritized Trajectory Replay

On-policy deep reinforcement learning algorithms have low data utilizati...
research
09/30/2022

B2RL: An open-source Dataset for Building Batch Reinforcement Learning

Batch reinforcement learning (BRL) is an emerging research area in the R...
research
05/24/2023

Control invariant set enhanced safe reinforcement learning: improved sampling efficiency, guaranteed stability and robustness

Reinforcement learning (RL) is an area of significant research interest,...

Please sign up or login with your details

Forgot password? Click here to reset