There is a Singularity in the Loss Landscape

01/12/2022
by   Mark Lowell, et al.
0

Despite the widespread adoption of neural networks, their training dynamics remain poorly understood. We show experimentally that as the size of the dataset increases, a point forms where the magnitude of the gradient of the loss becomes unbounded. Gradient descent rapidly brings the network close to this singularity in parameter space, and further training takes place near it. This singularity explains a variety of phenomena recently observed in the Hessian of neural network loss functions, such as training on the edge of stability and the concentration of the gradient in a top subspace. Once the network approaches the singularity, the top subspace contributes little to learning, even though it constitutes the majority of the gradient.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/30/2022

Self-Stabilization: The Implicit Bias of Gradient Descent at the Edge of Stability

Traditional analyses of gradient descent show that when the largest eige...
research
12/12/2018

Gradient Descent Happens in a Tiny Subspace

We show that in a variety of large-scale deep learning scenarios the gra...
research
10/07/2022

Understanding Edge-of-Stability Training Dynamics with a Minimalist Example

Recently, researchers observed that gradient descent for deep neural net...
research
05/22/2023

Gradient Descent Monotonically Decreases the Sharpness of Gradient Flow Solutions in Scalar Networks and Beyond

Recent research shows that when Gradient Descent (GD) is applied to neur...
research
06/06/2019

Learning in Gated Neural Networks

Gating is a key feature in modern neural networks including LSTMs, GRUs ...
research
04/24/2022

The Multiscale Structure of Neural Network Loss Functions: The Effect on Optimization and Origin

Local quadratic approximation has been extensively used to study the opt...
research
09/05/2013

Accelerating Hessian-free optimization for deep neural networks by implicit preconditioning and sampling

Hessian-free training has become a popular parallel second or- der optim...

Please sign up or login with your details

Forgot password? Click here to reset