DeepAI AI Chat
Log In Sign Up

On the Global Convergence of Gradient Descent for Over-parameterized Models using Optimal Transport

by   Lenaïc Chizat, et al.

Many tasks in machine learning and signal processing can be solved by minimizing a convex function of a measure. This includes sparse spikes deconvolution or training a neural network with a single hidden layer. For these problems, we study a simple minimization method: the unknown measure is discretized into a mixture of particles and a continuous-time gradient descent is performed on their weights and positions. This is an idealization of the usual way to train neural networks with a large hidden layer. We show that, when initialized correctly and in the many-particle limit, this gradient flow, although non-convex, converges to global minimizers. The proof involves Wasserstein gradient flows, a by-product of optimal transport theory. Numerical experiments show that this asymptotic behavior is already at play for a reasonable number of particles, even in high dimension.


page 1

page 2

page 3

page 4


Sparse Optimization on Measures with Over-parameterized Gradient Descent

Minimizing a convex function of a measure with a sparsity-inducing penal...

Analysis of a Two-Layer Neural Network via Displacement Convexity

Fitting a function by using linear combinations of a large number N of `...

Input Convex Gradient Networks

The gradients of convex functions are expressive models of non-trivial v...

A Principle of Least Action for the Training of Neural Networks

Neural networks have been achieving high generalization performance on m...

SISTA: learning optimal transport costs under sparsity constraints

In this paper, we describe a novel iterative procedure called SISTA to l...

AdaLoss: A computationally-efficient and provably convergent adaptive gradient method

We propose a computationally-friendly adaptive learning rate schedule, "...

Learning Gradients of Convex Functions with Monotone Gradient Networks

While much effort has been devoted to deriving and studying effective co...