Learning and Generalization in Overparameterized Neural Networks, Going Beyond Two Layers

11/12/2018
by   Zeyuan Allen-Zhu, et al.
0

Neural networks have great success in many machine learning applications, but the fundamental learning theory behind them remains largely unsolved. Learning neural networks is NP-hard, but in practice, simple algorithms like stochastic gradient descent (SGD) often produce good solutions. Moreover, it is observed that overparameterization --- designing networks whose number of parameters is larger than statistically needed to perfectly fit the data --- improves both optimization and generalization, appearing to contradict traditional learning theory. In this work, we extend the theoretical understanding of two and three-layer neural networks in the overparameterized regime. We prove that, using overparameterized neural networks, one can (improperly) learn some notable hypothesis classes, including two and three-layer neural networks with fewer parameters. Moreover, the learning process can be simply done by SGD or its variants in polynomial time using polynomially many samples. We also show that for a fixed sample size, the generalization error of the solution found by some SGD variant can be made almost independent of the number of parameters in the overparameterized network.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/03/2018

Learning Overparameterized Neural Networks via Stochastic Gradient Descent on Structured Data

Neural networks have many successful applications, while much less theor...
research
02/26/2021

Experiments with Rich Regime Training for Deep Learning

In spite of advances in understanding lazy training, recent work attribu...
research
11/22/2019

Neural Networks Learning and Memorization with (almost) no Over-Parameterization

Many results in recent years established polynomial time learnability of...
research
05/29/2023

Escaping mediocrity: how two-layer networks learn hard single-index models with SGD

This study explores the sample complexity for two-layer neural networks ...
research
05/25/2019

Stochastic Shared Embeddings: Data-driven Regularization of Embedding Layers

In deep neural nets, lower level embedding layers account for a large po...
research
08/13/2020

The Slow Deterioration of the Generalization Error of the Random Feature Model

The random feature model exhibits a kind of resonance behavior when the ...
research
11/25/2019

Empirical Study of Easy and Hard Examples in CNN Training

Deep Neural Networks (DNNs) generalize well despite their massive size a...

Please sign up or login with your details

Forgot password? Click here to reset