Big Neural Networks Waste Capacity

01/16/2013
by   Yann N. Dauphin, et al.
0

This article exposes the failure of some big neural networks to leverage added capacity to reduce underfitting. Past research suggest diminishing returns when increasing the size of neural networks. Our experiments on ImageNet LSVRC-2010 show that this may be due to the fact there are highly diminishing returns for capacity in terms of training error, leading to underfitting. This suggests that the optimization method - first order gradient descent - fails at this regime. Directly attacking this problem, either through the optimization method or the choices of parametrization, may allow to improve the generalization error on large datasets, for which a large capacity is required.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/15/2020

The Neural Tangent Kernel in High Dimensions: Triple Descent and a Multi-Scale Theory of Generalization

Modern deep learning models employ considerably more parameters than req...
research
08/25/2020

Stochastic Markov Gradient Descent and Training Low-Bit Neural Networks

The massive size of modern neural networks has motivated substantial rec...
research
10/22/2020

Beyond Lazy Training for Over-parameterized Tensor Decomposition

Over-parametrization is an important technique in training neural networ...
research
07/27/2020

Universality of Gradient Descent Neural Network Training

It has been observed that design choices of neural networks are often cr...
research
11/17/2022

Why Deep Learning Generalizes

Very large deep learning models trained using gradient descent are remar...
research
11/24/2015

Dynamic Capacity Networks

We introduce the Dynamic Capacity Network (DCN), a neural network that c...
research
02/16/2022

On Measuring Excess Capacity in Neural Networks

We study the excess capacity of deep networks in the context of supervis...

Please sign up or login with your details

Forgot password? Click here to reset