Why Deep Learning Generalizes

11/17/2022
by   Benjamin L. Badger, et al.
0

Very large deep learning models trained using gradient descent are remarkably resistant to memorization given their huge capacity, but are at the same time capable of fitting large datasets of pure noise. Here methods are introduced by which models may be trained to memorize datasets that normally are generalized. We find that memorization is difficult relative to generalization, but that adding noise makes memorization easier. Increasing the dataset size exaggerates the characteristics of that dataset: model access to more training samples makes overfitting easier for random data, but somewhat harder for natural images. The bias of deep learning towards generalization is explored theoretically, and we show that generalization results from a model's parameters being attracted to points of maximal stability with respect to that model's inputs during gradient descent.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/13/2023

Understanding the robustness difference between stochastic gradient descent and adaptive gradient methods

Stochastic gradient descent (SGD) and adaptive gradient methods, such as...
research
02/25/2020

Coherent Gradients: An Approach to Understanding Generalization in Gradient Descent-based Optimization

An open question in the Deep Learning community is why neural networks t...
research
02/13/2018

Towards Understanding the Generalization Bias of Two Layer Convolutional Linear Classifiers with Gradient Descent

A major challenge in understanding the generalization of deep learning i...
research
03/27/2019

Gradient Descent with Early Stopping is Provably Robust to Label Noise for Overparameterized Neural Networks

Modern neural networks are typically trained in an over-parameterized re...
research
06/09/2022

Uncovering bias in the PlantVillage dataset

We report our investigation on the use of the popular PlantVillage datas...
research
01/16/2013

Big Neural Networks Waste Capacity

This article exposes the failure of some big neural networks to leverage...
research
02/06/2022

Anticorrelated Noise Injection for Improved Generalization

Injecting artificial noise into gradient descent (GD) is commonly employ...

Please sign up or login with your details

Forgot password? Click here to reset