Supplementing Gradient-Based Reinforcement Learning with Simple Evolutionary Ideas

05/10/2023
by   Harshad Khadilkar, et al.
0

We present a simple, sample-efficient algorithm for introducing large but directed learning steps in reinforcement learning (RL), through the use of evolutionary operators. The methodology uses a population of RL agents training with a common experience buffer, with occasional crossovers and mutations of the agents in order to search efficiently through the policy space. Unlike prior literature on combining evolutionary search (ES) with RL, this work does not generate a distribution of agents from a common mean and covariance matrix. Neither does it require the evaluation of the entire population of policies at every time step. Instead, we focus on gradient-based training throughout the life of every policy (individual), with a sparse amount of evolutionary exploration. The resulting algorithm is shown to be robust to hyperparameter variations. As a surprising corollary, we show that simply initialising and training multiple RL agents with a common memory (with no further evolutionary updates) outperforms several standard RL baselines.

READ FULL TEXT

page 14

page 15

page 16

page 17

research
06/15/2020

QD-RL: Efficient Mixing of Quality and Diversity in Reinforcement Learning

We propose a novel reinforcement learning algorithm,QD-RL, that incorpor...
research
10/26/2022

ERL-Re^2: Efficient Evolutionary Reinforcement Learning with Shared State Representation and Individual Policy Representation

Deep Reinforcement Learning (Deep RL) and Evolutionary Algorithm (EA) ar...
research
06/18/2019

Evolutionary Reinforcement Learning for Sample-Efficient Multiagent Coordination

A key challenge for Multiagent RL (Reinforcement Learning) is the design...
research
03/09/2023

Evolving Populations of Diverse RL Agents with MAP-Elites

Quality Diversity (QD) has emerged as a powerful alternative optimizatio...
research
02/08/2019

Novelty Search for Deep Reinforcement Learning Policy Network Weights by Action Sequence Edit Metric Distance

Reinforcement learning (RL) problems often feature deceptive local optim...
research
01/09/2020

Population-Guided Parallel Policy Search for Reinforcement Learning

In this paper, a new population-guided parallel learning scheme is propo...
research
05/30/2021

Shaped Policy Search for Evolutionary Strategies using Waypoints

In this paper, we try to improve exploration in Blackbox methods, partic...

Please sign up or login with your details

Forgot password? Click here to reset