Rethinking Population-assisted Off-policy Reinforcement Learning

05/04/2023
by   Bowen Zheng, et al.
0

While off-policy reinforcement learning (RL) algorithms are sample efficient due to gradient-based updates and data reuse in the replay buffer, they struggle with convergence to local optima due to limited exploration. On the other hand, population-based algorithms offer a natural exploration strategy, but their heuristic black-box operators are inefficient. Recent algorithms have integrated these two methods, connecting them through a shared replay buffer. However, the effect of using diverse data from population optimization iterations on off-policy RL algorithms has not been thoroughly investigated. In this paper, we first analyze the use of off-policy RL algorithms in combination with population-based algorithms, showing that the use of population data could introduce an overlooked error and harm performance. To test this, we propose a uniform and scalable training design and conduct experiments on our tailored framework in robot locomotion tasks from the OpenAI gym. Our results substantiate that using population data in off-policy RL can cause instability during training and even degrade performance. To remedy this issue, we further propose a double replay buffer design that provides more on-policy data and show its effectiveness through experiments. Our results offer practical insights for training these hybrid methods.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/09/2023

The Role of Diverse Replay for Generalisation in Reinforcement Learning

In reinforcement learning (RL), key components of many algorithms are th...
research
06/26/2022

Analysis of Stochastic Processes through Replay Buffers

Replay buffers are a key component in many reinforcement learning scheme...
research
09/15/2022

On the Reuse Bias in Off-Policy Reinforcement Learning

Importance sampling (IS) is a popular technique in off-policy evaluation...
research
10/03/2021

Parallel Actors and Learners: A Framework for Generating Scalable RL Implementations

Reinforcement Learning (RL) has achieved significant success in applicat...
research
07/15/2023

An Empirical Study of the Effectiveness of Using a Replay Buffer on Mode Discovery in GFlowNets

Reinforcement Learning (RL) algorithms aim to learn an optimal policy by...
research
09/18/2023

Contrastive Initial State Buffer for Reinforcement Learning

In Reinforcement Learning, the trade-off between exploration and exploit...
research
06/26/2020

DDPG++: Striving for Simplicity in Continuous-control Off-Policy Reinforcement Learning

This paper prescribes a suite of techniques for off-policy Reinforcement...

Please sign up or login with your details

Forgot password? Click here to reset