DeepAI AI Chat
Log In Sign Up

Mean-Field Langevin Dynamics and Energy Landscape of Neural Networks

by   Kaitong Hu, et al.

We present a probabilistic analysis of the long-time behaviour of the nonlocal, diffusive equations with a gradient flow structure in 2-Wasserstein metric, namely, the Mean-Field Langevin Dynamics (MFLD). Our work is motivated by a desire to provide a theoretical underpinning for the convergence of stochastic gradient type algorithms widely used for non-convex learning tasks such as training of deep neural networks. The key insight is that the certain class of the finite dimensional non-convex problems becomes convex when lifted to infinite dimensional space of measures. We leverage this observation and show that the corresponding energy functional defined on the space of probability measures has a unique minimiser which can be characterised by a first order condition using the notion of linear functional derivative. Next, we show that the flow of marginal laws induced by the MFLD converges to the stationary distribution which is exactly the minimiser of the energy functional. We show that this convergence is exponential under conditions that are satisfied for highly regularised learning tasks. At the heart of our analysis is a pathwise perspective on Otto calculus used in gradient flow literature which is of independent interest. Our proof of convergence to stationary probability measure is novel and it relies on a generalisation of LaSalle's invariance principle. Importantly we do not assume that interaction potential of MFLD is of convolution type nor that has any particular symmetric structure. This is critical for applications. Finally, we show that the error between finite dimensional optimisation problem and its infinite dimensional limit is of order one over the number of parameters.


page 1

page 2

page 3

page 4


Global Convergence of Second-order Dynamics in Two-layer Neural Networks

Recent results have shown that for two-layer fully connected neural netw...

Gradient Layer: Enhancing the Convergence of Adversarial Training for Generative Models

We propose a new technique that boosts the convergence of training gener...

A Riemannian Mean Field Formulation for Two-layer Neural Networks with Batch Normalization

The training dynamics of two-layer neural networks with batch normalizat...

A Mean-field Analysis of Deep ResNet and Beyond: Towards Provable Optimization Via Overparameterization From Depth

Training deep neural networks with stochastic gradient descent (SGD) can...

Gradient flows on graphons: existence, convergence, continuity equations

Wasserstein gradient flows on probability measures have found a host of ...

Stochastic Particle Gradient Descent for Infinite Ensembles

The superior performance of ensemble methods with infinite models are we...