SGD on Neural Networks Learns Functions of Increasing Complexity

05/28/2019
by   Preetum Nakkiran, et al.
0

We perform an experimental study of the dynamics of Stochastic Gradient Descent (SGD) in learning deep neural networks for several real and synthetic classification tasks. We show that in the initial epochs, almost all of the performance improvement of the classifier obtained by SGD can be explained by a linear classifier. More generally, we give evidence for the hypothesis that, as iterations progress, SGD learns functions of increasing complexity. This hypothesis can be helpful in explaining why SGD-learned classifiers tend to generalize well even in the over-parameterized regime. We also show that the linear classifier learned in the initial stages is "retained" throughout the execution even if training is continued to the point of zero training error, and complement this with a theoretical result in a simplified model. Key to our work is a new measure of how well one classifier explains the performance of another, based on conditional mutual information.

READ FULL TEXT

page 2

page 14

research
10/01/2019

How noise affects the Hessian spectrum in overparameterized neural networks

Stochastic gradient descent (SGD) forms the core optimization method for...
research
04/15/2019

The Impact of Neural Network Overparameterization on Gradient Confusion and Stochastic Gradient Descent

The goal of this paper is to study why stochastic gradient descent (SGD)...
research
08/03/2018

Learning Overparameterized Neural Networks via Stochastic Gradient Descent on Structured Data

Neural networks have many successful applications, while much less theor...
research
12/24/2022

Visualizing Information Bottleneck through Variational Inference

The Information Bottleneck theory provides a theoretical and computation...
research
09/15/2022

Random initialisations performing above chance and how to find them

Neural networks trained with stochastic gradient descent (SGD) starting ...
research
02/21/2023

SGD learning on neural networks: leap complexity and saddle-to-saddle dynamics

We investigate the time complexity of SGD learning on fully-connected ne...
research
07/18/2022

Hidden Progress in Deep Learning: SGD Learns Parities Near the Computational Limit

There is mounting empirical evidence of emergent phenomena in the capabi...

Please sign up or login with your details

Forgot password? Click here to reset