Loss Spike in Training Neural Networks

05/20/2023
by   Zhongwang Zhang, et al.
0

In this work, we study the mechanism underlying loss spikes observed during neural network training. When the training enters a region, which has a smaller-loss-as-sharper (SLAS) structure, the training becomes unstable and loss exponentially increases once it is too sharp, i.e., the rapid ascent of the loss spike. The training becomes stable when it finds a flat region. The deviation in the first eigen direction (with maximum eigenvalue of the loss Hessian (λ_max) is found to be dominated by low-frequency. Since low-frequency is captured very fast (frequency principle), the rapid descent is then observed. Inspired by our analysis of loss spikes, we revisit the link between λ_max flatness and generalization. For real datasets, low-frequency is often dominant and well-captured by both the training data and the test data. Then, a solution with good generalization and a solution with bad generalization can both learn low-frequency well, thus, they have little difference in the sharpest direction. Therefore, although λ_max can indicate the sharpness of the loss landscape, deviation in its corresponding eigen direction is not responsible for the generalization difference. We also find that loss spikes can facilitate condensation, i.e., input weights evolve towards the same, which may be the underlying mechanism for why the loss spike improves generalization, rather than simply controlling the value of λ_max.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/03/2018

Training behavior of deep neural network in frequency domain

Why deep neural networks (DNNs) capable of overfitting often generalize ...
research
11/01/2021

A variance principle explains why dropout finds flatter minima

Although dropout has achieved great success in deep learning, little is ...
research
03/15/2022

Surrogate Gap Minimization Improves Sharpness-Aware Training

The recently proposed Sharpness-Aware Minimization (SAM) improves genera...
research
12/16/2021

Visualizing the Loss Landscape of Winning Lottery Tickets

The underlying loss landscapes of deep neural networks have a great impa...
research
05/28/2022

A Quadrature Perspective on Frequency Bias in Neural Network Training with Nonuniform Data

Small generalization errors of over-parameterized neural networks (NNs) ...
research
02/20/2020

Do We Need Zero Training Loss After Achieving Zero Training Error?

Overparameterized deep networks have the capacity to memorize training d...
research
04/24/2022

The Multiscale Structure of Neural Network Loss Functions: The Effect on Optimization and Origin

Local quadratic approximation has been extensively used to study the opt...

Please sign up or login with your details

Forgot password? Click here to reset