On the Power of Over-parametrization in Neural Networks with Quadratic Activation

03/03/2018
by   Simon S. Du, et al.
0

We provide new theoretical insights on why over-parametrization is effective in learning neural networks. For a k hidden node shallow network with quadratic activation and n training data points, we show as long as k >√(2n), over-parametrization enables local search algorithms to find a globally optimal solution for general smooth and convex loss functions. Further, despite that the number of parameters may exceed the sample size, using theory of Rademacher complexity, we show with weight decay, the solution also generalizes well if the data is sampled from a regular distribution such as Gaussian. To prove when k>√(2n), the loss function has benign landscape properties, we adopt an idea from smoothed analysis, which may have other applications in studying loss surfaces of neural networks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/16/2017

Theoretical insights into the optimization landscape of over-parameterized shallow neural networks

In this paper we study the problem of learning a shallow artificial neur...
research
12/31/2019

No Spurious Local Minima in Deep Quadratic Networks

Despite their practical success, a theoretical understanding of the loss...
research
04/21/2021

MLDS: A Dataset for Weight-Space Analysis of Neural Networks

Neural networks are powerful models that solve a variety of complex real...
research
12/02/2020

Neural Teleportation

In this paper, we explore a process called neural teleportation, a mathe...
research
10/18/2017

Characterization of Gradient Dominance and Regularity Conditions for Neural Networks

The past decade has witnessed a successful application of deep learning ...
research
01/08/2019

Visualising Basins of Attraction for the Cross-Entropy and the Squared Error Neural Network Loss Functions

Quantification of the stationary points and the associated basins of att...
research
10/09/2018

Learning One-hidden-layer Neural Networks under General Input Distributions

Significant advances have been made recently on training neural networks...

Please sign up or login with your details

Forgot password? Click here to reset