On the Complexity of Learning Neural Networks

07/14/2017
by   Le Song, et al.
0

The stunning empirical successes of neural networks currently lack rigorous theoretical explanation. What form would such an explanation take, in the face of existing complexity-theoretic lower bounds? A first step might be to show that data generated by neural networks with a single hidden layer, smooth activation functions and benign input distributions can be learned efficiently. We demonstrate here a comprehensive lower bound ruling out this possibility: for a wide class of activation functions (including all currently used), and inputs drawn from any logconcave distribution, there is a family of one-hidden-layer functions whose output is a sum gate, that are hard to learn in a precise sense: any statistical query algorithm (which includes all known variants of stochastic gradient descent with any loss function) needs an exponential number of queries even using tolerance inversely proportional to the input dimensionality. Moreover, this hard family of functions is realizable with a small (sublinear in dimension) number of activation units in the single hidden layer. The lower bound is also robust to small perturbations of the true weights. Systematic experiments illustrate a phase transition in the training error as predicted by the analysis.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/12/2019

Tight Sample Complexity of Learning One-hidden-layer Convolutional Neural Networks

We study the sample complexity of learning one-hidden-layer convolutiona...
research
09/18/2022

Is Stochastic Gradient Descent Near Optimal?

The success of neural networks over the past decade has established them...
research
05/03/2018

Lifted Neural Networks

We describe a novel family of models of multi- layer feedforward neural ...
research
08/10/2021

Linear approximability of two-layer neural networks: A comprehensive analysis based on spectral decay

In this paper, we present a spectral-based approach to study the linear ...
research
03/28/2020

Memorizing Gaussians with no over-parameterizaion via gradient decent on neural networks

We prove that a single step of gradient decent over depth two network, w...
research
10/29/2020

Over-parametrized neural networks as under-determined linear systems

We draw connections between simple neural networks and under-determined ...
research
10/05/2014

Understanding Locally Competitive Networks

Recently proposed neural network activation functions such as rectified ...

Please sign up or login with your details

Forgot password? Click here to reset