On the Global Convergence of Gradient Descent for Over-parameterized Models using Optimal Transport

05/24/2018
by   Lenaïc Chizat, et al.
8

Many tasks in machine learning and signal processing can be solved by minimizing a convex function of a measure. This includes sparse spikes deconvolution or training a neural network with a single hidden layer. For these problems, we study a simple minimization method: the unknown measure is discretized into a mixture of particles and a continuous-time gradient descent is performed on their weights and positions. This is an idealization of the usual way to train neural networks with a large hidden layer. We show that, when initialized correctly and in the many-particle limit, this gradient flow, although non-convex, converges to global minimizers. The proof involves Wasserstein gradient flows, a by-product of optimal transport theory. Numerical experiments show that this asymptotic behavior is already at play for a reasonable number of particles, even in high dimension.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/24/2019

Sparse Optimization on Measures with Over-parameterized Gradient Descent

Minimizing a convex function of a measure with a sparsity-inducing penal...
research
01/05/2019

Analysis of a Two-Layer Neural Network via Displacement Convexity

Fitting a function by using linear combinations of a large number N of `...
research
11/23/2021

Input Convex Gradient Networks

The gradients of convex functions are expressive models of non-trivial v...
research
09/17/2020

A Principle of Least Action for the Training of Neural Networks

Neural networks have been achieving high generalization performance on m...
research
09/18/2020

SISTA: learning optimal transport costs under sparsity constraints

In this paper, we describe a novel iterative procedure called SISTA to l...
research
09/17/2021

AdaLoss: A computationally-efficient and provably convergent adaptive gradient method

We propose a computationally-friendly adaptive learning rate schedule, "...
research
01/25/2023

Learning Gradients of Convex Functions with Monotone Gradient Networks

While much effort has been devoted to deriving and studying effective co...

Please sign up or login with your details

Forgot password? Click here to reset