On the Relationship Between the OpenAI Evolution Strategy and Stochastic Gradient Descent

12/18/2017
by   Xingwen Zhang, et al.
0

Because stochastic gradient descent (SGD) has shown promise optimizing neural networks with millions of parameters and few if any alternatives are known to exist, it has moved to the heart of leading approaches to reinforcement learning (RL). For that reason, the recent result from OpenAI showing that a particular kind of evolution strategy (ES) can rival the performance of SGD-based deep RL methods with large neural networks provoked surprise. This result is difficult to interpret in part because of the lingering ambiguity on how ES actually relates to SGD. The aim of this paper is to significantly reduce this ambiguity through a series of MNIST-based experiments designed to uncover their relationship. As a simple supervised problem without domain noise (unlike in most RL), MNIST makes it possible (1) to measure the correlation between gradients computed by ES and SGD and (2) then to develop an SGD-based proxy that accurately predicts the performance of different ES population sizes. These innovations give a new level of insight into the real capabilities of ES, and lead also to some unconventional means for applying ES to supervised problems that shed further light on its differences from SGD. Incorporating these lessons, the paper concludes by demonstrating that ES can achieve 99 accuracy on MNIST, a number higher than any previously published result for any evolutionary method. While not by any means suggesting that ES should substitute for SGD in supervised learning, the suite of experiments herein enables more informed decisions on the application of ES within RL and other paradigms.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/16/2018

Evolutionary Stochastic Gradient Descent for Optimization of Deep Neural Networks

We propose a population-based Evolutionary Stochastic Gradient Descent (...
research
09/29/2018

Directional Analysis of Stochastic Gradient Descent via von Mises-Fisher Distributions in Deep learning

Although stochastic gradient descent (SGD) is a driving force behind the...
research
12/14/2015

Preconditioned Stochastic Gradient Descent

Stochastic gradient descent (SGD) still is the workhorse for many practi...
research
06/01/2022

Computing the Variance of Shuffling Stochastic Gradient Algorithms via Power Spectral Density Analysis

When solving finite-sum minimization problems, two common alternatives t...
research
05/20/2023

Evolutionary Algorithms in the Light of SGD: Limit Equivalence, Minima Flatness, and Transfer Learning

Whenever applicable, the Stochastic Gradient Descent (SGD) has shown its...
research
07/16/2018

Evolving Differentiable Gene Regulatory Networks

Over the past twenty years, artificial Gene Regulatory Networks (GRNs) h...
research
03/16/2020

Explaining Memorization and Generalization: A Large-Scale Study with Coherent Gradients

Coherent Gradients is a recently proposed hypothesis to explain why over...

Please sign up or login with your details

Forgot password? Click here to reset