Non-Differentiable Supervised Learning with Evolution Strategies and Hybrid Methods

06/07/2019
by   Karel Lenc, et al.
9

In this work we show that Evolution Strategies (ES) are a viable method for learning non-differentiable parameters of large supervised models. ES are black-box optimization algorithms that estimate distributions of model parameters; however they have only been used for relatively small problems so far. We show that it is possible to scale ES to more complex tasks and models with millions of parameters. While using ES for differentiable parameters is computationally impractical (although possible), we show that a hybrid approach is practically feasible in the case where the model has both differentiable and non-differentiable parameters. In this approach we use standard gradient-based methods for learning differentiable weights, while using ES for learning non-differentiable parameters - in our case sparsity masks of the weights. This proposed method is surprisingly competitive, and when parallelized over multiple devices has only negligible training time overhead compared to training with gradient descent. Additionally, this method allows to train sparse models from the first training step, so they can be much larger than when using methods that require training dense models first. We present results and analysis of supervised feed-forward models (such as MNIST and CIFAR-10 classification), as well as recurrent models, such as SparseWaveRNN for text-to-speech.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/13/2019

Neural network gradient-based learning of black-box function interfaces

Deep neural networks work well at approximating complicated functions wh...
research
06/16/2021

Amortized Synthesis of Constrained Configurations Using a Differentiable Surrogate

In design, fabrication, and control problems, we are often faced with th...
research
10/18/2019

Combinatorial Losses through Generalized Gradients of Integer Linear Programs

When samples have internal structure, we often see a mismatch between th...
research
02/05/2021

In-Loop Meta-Learning with Gradient-Alignment Reward

At the heart of the standard deep learning training loop is a greedy gra...
research
04/06/2018

Differentiable plasticity: training plastic neural networks with backpropagation

How can we build agents that keep learning from experience, quickly and ...
research
05/30/2021

Parameter Estimation for the SEIR Model Using Recurrent Nets

The standard way to estimate the parameters Θ_SEIR (e.g., the transmissi...
research
06/23/2020

Differentiable Segmentation of Sequences

Segmented models are widely used to describe non-stationary sequential d...

Please sign up or login with your details

Forgot password? Click here to reset