Approximating Gradients for Differentiable Quality Diversity in Reinforcement Learning

02/08/2022
by   Bryon Tjanaka, et al.
9

Consider a walking agent that must adapt to damage. To approach this task, we can train a collection of policies and have the agent select a suitable policy when damaged. Training this collection may be viewed as a quality diversity (QD) optimization problem, where we search for solutions (policies) which maximize an objective (walking forward) while spanning a set of measures (measurable characteristics). Recent work shows that differentiable quality diversity (DQD) algorithms greatly accelerate QD optimization when exact gradients are available for the objective and measures. However, such gradients are typically unavailable in RL settings due to non-differentiable environments. To apply DQD in RL settings, we propose to approximate objective and measure gradients with evolution strategies and actor-critic methods. We develop two variants of the DQD algorithm CMA-MEGA, each with different gradient approximations, and evaluate them on four simulated walking tasks. One variant achieves comparable performance (QD score) with the state-of-the-art PGA-MAP-Elites in two tasks. The other variant performs comparably in all tasks but is less efficient than PGA-MAP-Elites in two tasks. These results provide insight into the limitations of CMA-MEGA in domains that require rigorous optimization of the objective and where exact gradients are unavailable.

READ FULL TEXT
research
06/07/2021

Differentiable Quality Diversity

Quality diversity (QD) is a growing branch of stochastic optimization re...
research
01/29/2022

Zeroth-Order Actor-Critic

Zeroth-order optimization methods and policy gradient based first-order ...
research
04/14/2022

Accelerated Policy Learning with Parallel Differentiable Simulation

Deep reinforcement learning can generate complex control policies, but r...
research
03/07/2023

MAP-Elites with Descriptor-Conditioned Gradients and Archive Distillation into a Single Policy

Quality-Diversity algorithms, such as MAP-Elites, are a branch of Evolut...
research
02/27/2017

Reinforcement Learning with Deep Energy-Based Policies

We propose a method for learning expressive energy-based policies for co...
research
12/15/2022

Residual Policy Learning for Powertrain Control

Eco-driving strategies have been shown to provide significant reductions...

Please sign up or login with your details

Forgot password? Click here to reset