Harnessing Distribution Ratio Estimators for Learning Agents with Quality and Diversity

11/05/2020
by   Tanmay Gangwani, et al.
2

Quality-Diversity (QD) is a concept from Neuroevolution with some intriguing applications to Reinforcement Learning. It facilitates learning a population of agents where each member is optimized to simultaneously accumulate high task-returns and exhibit behavioral diversity compared to other members. In this paper, we build on a recent kernel-based method for training a QD policy ensemble with Stein variational gradient descent. With kernels based on f-divergence between the stationary distributions of policies, we convert the problem to that of efficient estimation of the ratio of these stationary distributions. We then study various distribution ratio estimators used previously for off-policy evaluation and imitation and re-purpose them to compute the gradients for policies in an ensemble such that the resultant population is diverse and of high-quality.

READ FULL TEXT
research
06/15/2020

QD-RL: Efficient Mixing of Quality and Diversity in Reinforcement Learning

We propose a novel reinforcement learning algorithm,QD-RL, that incorpor...
research
05/23/2023

Proximal Policy Gradient Arborescence for Quality Diversity Reinforcement Learning

Training generally capable agents that perform well in unseen dynamic en...
research
09/26/2022

DEFT: Diverse Ensembles for Fast Transfer in Reinforcement Learning

Deep ensembles have been shown to extend the positive effect seen in typ...
research
02/21/2020

GenDICE: Generalized Offline Estimation of Stationary Values

An important problem that arises in reinforcement learning and Monte Car...
research
11/22/2022

Efficient Exploration using Model-Based Quality-Diversity with Gradients

Exploration is a key challenge in Reinforcement Learning, especially in ...
research
06/04/2023

Data Quality in Imitation Learning

In supervised learning, the question of data quality and curation has be...
research
02/04/2019

PIPPS: Flexible Model-Based Policy Search Robust to the Curse of Chaos

Previously, the exploding gradient problem has been explained to be cent...

Please sign up or login with your details

Forgot password? Click here to reset