Global convergence of neuron birth-death dynamics

02/05/2019
by   Grant Rotskoff, et al.
4

Neural networks with a large number of parameters admit a mean-field description, which has recently served as a theoretical explanation for the favorable training properties of "overparameterized" models. In this regime, gradient descent obeys a deterministic partial differential equation (PDE) that converges to a globally optimal solution for networks with a single hidden layer under appropriate assumptions. In this work, we propose a non-local mass transport dynamics that leads to a modified PDE with the same minimizer. We implement this non-local dynamics as a stochastic neuronal birth-death process and we prove that it accelerates the rate of convergence in the mean-field limit. We subsequently realize this PDE with two classes of numerical schemes that converge to the mean-field equation, each of which can easily be implemented for neural networks with finite numbers of parameters. We illustrate our algorithms with two models to provide intuition for the mechanism through which convergence is accelerated.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/10/2018

Scaling limit of the Stein variational gradient descent part I: the mean field regime

We study an interacting particle system in R^d motivated by Stein variat...
research
05/23/2019

Accelerating Langevin Sampling with Birth-death

A fundamental problem in Bayesian inference and statistical machine lear...
research
07/03/2023

Coupled Gradient Flows for Strategic Non-Local Distribution Shift

We propose a novel framework for analyzing the dynamics of distribution ...
research
01/08/2022

Global Convergence Analysis of Deep Linear Networks with A One-neuron Layer

In this paper, we follow Eftekhari's work to give a non-local convergenc...
research
07/09/2019

Scaling Limit of Neural Networks with the Xavier Initialization and Convergence to a Global Minimum

We analyze single-layer neural networks with the Xavier initialization i...
research
02/16/2019

Mean-field theory of two-layers neural networks: dimension-free bounds and kernel limit

We consider learning two layer neural networks using stochastic gradient...
research
10/06/2021

On the Global Convergence of Gradient Descent for multi-layer ResNets in the mean-field regime

Finding the optimal configuration of parameters in ResNet is a nonconvex...

Please sign up or login with your details

Forgot password? Click here to reset