Theoretical insights into the optimization landscape of over-parameterized shallow neural networks

07/16/2017
by   Mahdi Soltanolkotabi, et al.
0

In this paper we study the problem of learning a shallow artificial neural network that best fits a training data set. We study this problem in the over-parameterized regime where the number of observations are fewer than the number of parameters in the model. We show that with quadratic activations the optimization landscape of training such shallow neural networks has certain favorable characteristics that allow globally optimal models to be found efficiently using a variety of local search heuristics. This result holds for an arbitrary training data of input/output pairs. For differentiable activation functions we also show that gradient descent, when suitably initialized, converges at a linear rate to a globally optimal model. This result focuses on a realizable model where the inputs are chosen i.i.d. from a Gaussian distribution and the labels are generated according to planted weight coefficients.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/10/2017

Learning ReLUs via Gradient Descent

In this paper we study the problem of learning Rectified Linear Units (R...
research
03/03/2018

On the Power of Over-parametrization in Neural Networks with Quadratic Activation

We provide new theoretical insights on why over-parametrization is effec...
research
10/09/2019

Nearly Minimal Over-Parametrization of Shallow Neural Networks

A recent line of work has shown that an overparametrized neural network ...
research
02/12/2019

Towards moderate overparameterization: global convergence guarantees for training shallow neural networks

Many modern neural network architectures are trained in an overparameter...
research
06/27/2020

Optimization and Generalization of Shallow Neural Networks with Quadratic Activation Functions

We study the dynamics of optimization and the generalization properties ...
research
10/01/2022

A Combinatorial Perspective on the Optimization of Shallow ReLU Networks

The NP-hard problem of optimizing a shallow ReLU network can be characte...
research
03/05/2020

Forgetting Outside the Box: Scrubbing Deep Networks of Information Accessible from Input-Output Observations

We describe a procedure for removing dependency on a cohort of training ...

Please sign up or login with your details

Forgot password? Click here to reset