Policy Manifold Search: Exploring the Manifold Hypothesis for Diversity-based Neuroevolution

by   Nemanja Rakicevic, et al.

Neuroevolution is an alternative to gradient-based optimisation that has the potential to avoid local minima and allows parallelisation. The main limiting factor is that usually it does not scale well with parameter space dimensionality. Inspired by recent work examining neural network intrinsic dimension and loss landscapes, we hypothesise that there exists a low-dimensional manifold, embedded in the policy network parameter space, around which a high-density of diverse and useful policies are located. This paper proposes a novel method for diversity-based policy search via Neuroevolution, that leverages learned representations of the policy network parameters, by performing policy search in this learned representation space. Our method relies on the Quality-Diversity (QD) framework which provides a principled approach to policy search, and maintains a collection of diverse policies, used as a dataset for learning policy representations. Further, we use the Jacobian of the inverse-mapping function to guide the search in the representation space. This ensures that the generated samples remain in the high-density regions, after mapping back to the original space. Finally, we evaluate our contributions on four continuous-control tasks in simulated environments, and compare to diversity-based baselines.


Policy Manifold Search for Improving Diversity-based Neuroevolution

Diversity-based approaches have recently gained popularity as an alterna...

Selection-Expansion: A Unifying Framework for Motion-Planning and Diversity Search Algorithms

Reinforcement learning agents need a reward signal to learn successful p...

Density estimation on low-dimensional manifolds: an inflation-deflation approach

Normalizing Flows (NFs) are universal density estimators based on Neuron...

Learning Neural Search Policies for Classical Planning

Heuristic forward search is currently the dominant paradigm in classical...

Comparing reliability of grid-based Quality-Diversity algorithms using artificial landscapes

Quality-Diversity (QD) algorithms are a recent type of optimisation meth...

Exploiting Learned Policies in Focal Search

Recent machine-learning approaches to deterministic search and domain-in...

EOS: Automatic In-vivo Evolution of Kernel Policies for Better Performance

Today's monolithic kernels often implement a small, fixed set of policie...