Analysis of a Two-Layer Neural Network via Displacement Convexity

01/05/2019
by   Adel Javanmard, et al.
0

Fitting a function by using linear combinations of a large number N of `simple' components is one of the most fruitful ideas in statistical learning. This idea lies at the core of a variety of methods, from two-layer neural networks to kernel regression, to boosting. In general, the resulting risk minimization problem is non-convex and is solved by gradient descent or its variants. Unfortunately, little is known about global convergence properties of these approaches. Here we consider the problem of learning a concave function f on a compact convex domain Ω⊆ R^d, using linear combinations of `bump-like' components (neurons). The parameters to be fitted are the centers of N bumps, and the resulting empirical risk minimization problem is highly non-convex. We prove that, in the limit in which the number of neurons diverges, the evolution of gradient descent converges to a Wasserstein gradient flow in the space of probability distributions over Ω. Further, when the bump width δ tends to 0, this gradient flow has a limit which is a viscous porous medium equation. Remarkably, the cost function optimized by this gradient flow exhibits a special property known as displacement convexity, which implies exponential convergence rates for N→∞, δ→ 0. Surprisingly, this asymptotic theory appears to capture well the behavior for moderate values of δ, N. Explaining this phenomenon, and understanding the dependence on δ,N in a quantitative manner remains an outstanding challenge.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/24/2018

On the Global Convergence of Gradient Descent for Over-parameterized Models using Optimal Transport

Many tasks in machine learning and signal processing can be solved by mi...
research
02/19/2020

A Unified Convergence Analysis for Shuffling-Type Gradient Methods

In this paper, we provide a unified convergence analysis for a class of ...
research
11/04/2018

A Function Fitting Method

In this article we present a function fitting method, which is a convex ...
research
04/19/2023

Leveraging the two timescale regime to demonstrate convergence of neural networks

We study the training dynamics of shallow neural networks, in a two-time...
research
10/09/2018

Collective evolution of weights in wide neural networks

We derive a nonlinear integro-differential transport equation describing...
research
07/24/2019

Sparse Optimization on Measures with Over-parameterized Gradient Descent

Minimizing a convex function of a measure with a sparsity-inducing penal...
research
09/02/2022

Optimizing the Performative Risk under Weak Convexity Assumptions

In performative prediction, a predictive model impacts the distribution ...

Please sign up or login with your details

Forgot password? Click here to reset