Mean-Field Langevin Dynamics and Energy Landscape of Neural Networks

05/19/2019
by   Kaitong Hu, et al.
0

We present a probabilistic analysis of the long-time behaviour of the nonlocal, diffusive equations with a gradient flow structure in 2-Wasserstein metric, namely, the Mean-Field Langevin Dynamics (MFLD). Our work is motivated by a desire to provide a theoretical underpinning for the convergence of stochastic gradient type algorithms widely used for non-convex learning tasks such as training of deep neural networks. The key insight is that the certain class of the finite dimensional non-convex problems becomes convex when lifted to infinite dimensional space of measures. We leverage this observation and show that the corresponding energy functional defined on the space of probability measures has a unique minimiser which can be characterised by a first order condition using the notion of linear functional derivative. Next, we show that the flow of marginal laws induced by the MFLD converges to the stationary distribution which is exactly the minimiser of the energy functional. We show that this convergence is exponential under conditions that are satisfied for highly regularised learning tasks. At the heart of our analysis is a pathwise perspective on Otto calculus used in gradient flow literature which is of independent interest. Our proof of convergence to stationary probability measure is novel and it relies on a generalisation of LaSalle's invariance principle. Importantly we do not assume that interaction potential of MFLD is of convolution type nor that has any particular symmetric structure. This is critical for applications. Finally, we show that the error between finite dimensional optimisation problem and its infinite dimensional limit is of order one over the number of parameters.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/14/2020

Global Convergence of Second-order Dynamics in Two-layer Neural Networks

Recent results have shown that for two-layer fully connected neural netw...
research
10/25/2021

On quantitative Laplace-type convergence results for some exponential probability measures, with two applications

Laplace-type results characterize the limit of sequence of measures (π_ε...
research
01/07/2018

Gradient Layer: Enhancing the Convergence of Adversarial Training for Generative Models

We propose a new technique that boosts the convergence of training gener...
research
10/17/2021

A Riemannian Mean Field Formulation for Two-layer Neural Networks with Batch Normalization

The training dynamics of two-layer neural networks with batch normalizat...
research
03/11/2020

A Mean-field Analysis of Deep ResNet and Beyond: Towards Provable Optimization Via Overparameterization From Depth

Training deep neural networks with stochastic gradient descent (SGD) can...
research
12/11/2019

Mean-Field Neural ODEs via Relaxed Optimal Control

We develop a framework for the analysis of deep neural networks and neur...
research
10/26/2017

Interference Queueing Networks on Grids

Consider a countably infinite collection of coupled queues representing ...

Please sign up or login with your details

Forgot password? Click here to reset