Stiffness: A New Perspective on Generalization in Neural Networks

01/28/2019
by   Stanislav Fort, et al.
0

We investigate neural network training and generalization using the concept of stiffness. We measure how stiff a network is by looking at how a small gradient step on one example affects the loss on another example. In particular, we study how stiffness varies with 1) class membership, 2) distance between data points (in the input space as well as in latent spaces), 3) training iteration, and 4) learning rate. We empirically study the evolution of stiffness on MNIST, FASHION MNIST, CIFAR-10 and CIFAR-100 using fully-connected and convolutional neural networks. Our results demonstrate that stiffness is a useful concept for diagnosing and characterizing generalization. We observe that small learning rates lead to initial learning of more specific features that do not translate well to improvements on inputs from all classes, whereas high learning rates initially benefit all classes at once. We measure stiffness as a function of distance between data points and observe that higher learning rates induce positive correlation between changes in loss further apart, pointing towards a regularization effect of learning rate. When training on CIFAR-100, the stiffness matrix exhibits a coarse-grained behavior suggestive of the model's awareness of super-class membership.

READ FULL TEXT

page 5

page 6

research
07/10/2019

Towards Explaining the Regularization Effect of Initial Large Learning Rate in Training Neural Networks

Stochastic gradient descent with a large initial learning rate is a wide...
research
07/22/2023

The instabilities of large learning rate training: a loss landscape view

Modern neural networks are undeniably successful. Numerous works study h...
research
07/03/2021

Slope and generalization properties of neural networks

Neural networks are very successful tools in for example advanced classi...
research
07/06/2018

The Goldilocks zone: Towards better understanding of neural network loss landscapes

We explore the loss landscape of fully-connected neural networks using r...
research
12/14/2022

Maximal Initial Learning Rates in Deep ReLU Networks

Training a neural network requires choosing a suitable learning rate, in...
research
12/15/2021

Robust Neural Network Classification via Double Regularization

The presence of mislabeled observations in data is a notoriously challen...
research
05/13/2023

Efficient Asynchronize Stochastic Gradient Algorithm with Structured Data

Deep learning has achieved impressive success in a variety of fields bec...

Please sign up or login with your details

Forgot password? Click here to reset