Stationary Points of Shallow Neural Networks with Quadratic Activation Function

12/03/2019
by   David Gamarnik, et al.
0

We consider the problem of learning shallow neural networks with quadratic activation function and planted weights W^*∈R^m× d, where m is the width of the hidden layer and d≤ m is the data dimension. We establish that the landscape of the population risk L(W) admits an energy barrier separating rank-deficient solutions: if W∈R^m× d with rank(W)<d, then L(W)≥ 2σ_min(W^*)^4, where σ_min(W^*) is the smallest singular value of W^*. We then establish that all full-rank stationary points of L(·) are necessarily global optimum. These two results propose a simple explanation for the success of gradient descent in training such networks, when properly initialized: gradient descent algorithm finds global optimum due to absence of spurious stationary points within the set of full-rank matrices. We then show if the planted weight matrix W^*∈R^m× d has iid Gaussian entries, and is sufficiently wide, that is m>Cd^2 for a large C, then it is easy to construct a full rank matrix W with population risk below the energy barrier, starting from which gradient descent is guaranteed to converge to a global optimum. Our final focus is on sample complexity: we identify a simple necessary and sufficient geometric condition on the training data under which any minimizer of the empirical loss has necessarily small generalization error. We show that as soon as n≥ n^*=d(d+1)/2, random data enjoys this geometric condition almost surely, and in fact the generalization error is zero. At the same time we show that if n<n^*, then when the data is i.i.d. Gaussian, there always exists a matrix W with zero empirical risk, but with population risk bounded away from zero by the same amount as rank deficient matrices, namely by 2σ_min(W^*)^4.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/27/2020

Optimization and Generalization of Shallow Neural Networks with Quadratic Activation Functions

We study the dynamics of optimization and the generalization properties ...
research
05/29/2020

Agnostic Learning of a Single Neuron with Gradient Descent

We consider the problem of learning the best-fitting single neuron as me...
research
10/05/2022

Spectral Regularization Allows Data-frugal Learning over Combinatorial Spaces

Data-driven machine learning models are being increasingly employed in s...
research
05/29/2019

Global Guarantees for Blind Demodulation with Generative Priors

We study a deep learning inspired formulation for the blind demodulation...
research
05/19/2017

The Landscape of Deep Learning Algorithms

This paper studies the landscape of empirical risk of deep neural networ...
research
05/28/2018

Understanding Generalization and Optimization Performance of Deep CNNs

This work aims to provide understandings on the remarkable success of de...

Please sign up or login with your details

Forgot password? Click here to reset