DeepAI AI Chat
Log In Sign Up

On the Global Convergence of Gradient Descent for Over-parameterized Models using Optimal Transport

05/24/2018
by   Lenaïc Chizat, et al.
Inria
8

Many tasks in machine learning and signal processing can be solved by minimizing a convex function of a measure. This includes sparse spikes deconvolution or training a neural network with a single hidden layer. For these problems, we study a simple minimization method: the unknown measure is discretized into a mixture of particles and a continuous-time gradient descent is performed on their weights and positions. This is an idealization of the usual way to train neural networks with a large hidden layer. We show that, when initialized correctly and in the many-particle limit, this gradient flow, although non-convex, converges to global minimizers. The proof involves Wasserstein gradient flows, a by-product of optimal transport theory. Numerical experiments show that this asymptotic behavior is already at play for a reasonable number of particles, even in high dimension.

READ FULL TEXT

page 1

page 2

page 3

page 4

07/24/2019

Sparse Optimization on Measures with Over-parameterized Gradient Descent

Minimizing a convex function of a measure with a sparsity-inducing penal...
01/05/2019

Analysis of a Two-Layer Neural Network via Displacement Convexity

Fitting a function by using linear combinations of a large number N of `...
11/23/2021

Input Convex Gradient Networks

The gradients of convex functions are expressive models of non-trivial v...
09/17/2020

A Principle of Least Action for the Training of Neural Networks

Neural networks have been achieving high generalization performance on m...
09/18/2020

SISTA: learning optimal transport costs under sparsity constraints

In this paper, we describe a novel iterative procedure called SISTA to l...
09/17/2021

AdaLoss: A computationally-efficient and provably convergent adaptive gradient method

We propose a computationally-friendly adaptive learning rate schedule, "...
01/25/2023

Learning Gradients of Convex Functions with Monotone Gradient Networks

While much effort has been devoted to deriving and studying effective co...