Effective Diversity in Population-Based Reinforcement Learning

by   Jack Parker-Holder, et al.
University of Oxford
berkeley college

Maintaining a population of solutions has been shown to increase exploration in reinforcement learning, typically attributed to the greater diversity of behaviors considered. One such class of methods, novelty search, considers further boosting diversity across agents via a multi-objective optimization formulation. Despite the intuitive appeal, these mechanisms have several shortcomings. First, they make use of mean field updates, which induce cycling behaviors. Second, they often rely on handcrafted behavior characterizations, which require domain knowledge. Furthermore, boosting diversity often has a detrimental impact on optimizing already fruitful behaviors for rewards. Setting the relative importance of novelty- versus reward-factor is usually hardcoded or requires tedious tuning/annealing. In this paper, we introduce a novel measure of population-wide diversity, leveraging ideas from Determinantal Point Processes. We combine this in a principled fashion with the reward function to adapt to the degree of diversity during training, borrowing ideas from online learning. Combined with task-agnostic behavioral embeddings, we show this approach outperforms previous methods for multi-objective optimization, as well as vanilla algorithms solely optimizing for rewards.


page 1

page 2

page 3

page 4


Generalizing Across Multi-Objective Reward Functions in Deep Reinforcement Learning

Many reinforcement-learning researchers treat the reward function as a p...

Learning in Sparse Rewards settings through Quality-Diversity algorithms

In the Reinforcement Learning (RL) framework, the learning is guided thr...

Enhanced Optimization with Composite Objectives and Novelty Selection

An important benefit of multi-objective search is that it maintains a di...

From STL Rulebooks to Rewards

The automatic synthesis of neural-network controllers for autonomous age...

Efficient Exploration using Model-Based Quality-Diversity with Gradients

Exploration is a key challenge in Reinforcement Learning, especially in ...

Predator-prey survival pressure is sufficient to evolve swarming behaviors

The comprehension of how local interactions arise in global collective b...

Code Repositories


arXiv:2002.00632 implementation

view repo

Please sign up or login with your details

Forgot password? Click here to reset