Neural Networks Learning and Memorization with (almost) no Over-Parameterization

by   Amit Daniely, et al.

Many results in recent years established polynomial time learnability of various models via neural networks algorithms. However, unless the model is linear separable, or the activation is a polynomial, these results require very large networks – much more than what is needed for the mere existence of a good predictor. In this paper we prove that SGD on depth two neural networks can memorize samples, learn polynomials with bounded weights, and learn certain kernel spaces, with near optimal network size, sample complexity, and runtime. In particular, we show that SGD on depth two network with Õ(m/d) hidden neurons (and hence Õ(m) parameters) can memorize m random labeled points in S^d-1.


page 1

page 2

page 3

page 4


SGD Learns the Conjugate Kernel Class of the Network

We show that the standard stochastic gradient decent (SGD) algorithm is ...

Learning and Generalization in Overparameterized Neural Networks, Going Beyond Two Layers

Neural networks have great success in many machine learning applications...

Local SGD Optimizes Overparameterized Neural Networks in Polynomial Time

In this paper we prove that Local (S)GD (or FedAvg) can optimize two-lay...

Near-Linear Sample Complexity for L_p Polynomial Regression

We study L_p polynomial regression. Given query access to a function f:[...

Learning Two layer Networks with Multinomial Activation and High Thresholds

Giving provable guarantees for learning neural networks is a core challe...

On the Study of Sample Complexity for Polynomial Neural Networks

As a general type of machine learning approach, artificial neural networ...

Loss minimization yields multicalibration for large neural networks

Multicalibration is a notion of fairness that aims to provide accurate p...

Please sign up or login with your details

Forgot password? Click here to reset