DeepAI AI Chat
Log In Sign Up

There is a Singularity in the Loss Landscape

by   Mark Lowell, et al.

Despite the widespread adoption of neural networks, their training dynamics remain poorly understood. We show experimentally that as the size of the dataset increases, a point forms where the magnitude of the gradient of the loss becomes unbounded. Gradient descent rapidly brings the network close to this singularity in parameter space, and further training takes place near it. This singularity explains a variety of phenomena recently observed in the Hessian of neural network loss functions, such as training on the edge of stability and the concentration of the gradient in a top subspace. Once the network approaches the singularity, the top subspace contributes little to learning, even though it constitutes the majority of the gradient.


page 1

page 2

page 3

page 4


Self-Stabilization: The Implicit Bias of Gradient Descent at the Edge of Stability

Traditional analyses of gradient descent show that when the largest eige...

Gradient Descent Happens in a Tiny Subspace

We show that in a variety of large-scale deep learning scenarios the gra...

Understanding Edge-of-Stability Training Dynamics with a Minimalist Example

Recently, researchers observed that gradient descent for deep neural net...

Gradient Descent on Neural Networks Typically Occurs at the Edge of Stability

We empirically demonstrate that full-batch gradient descent on neural ne...

Learning in Gated Neural Networks

Gating is a key feature in modern neural networks including LSTMs, GRUs ...

The Multiscale Structure of Neural Network Loss Functions: The Effect on Optimization and Origin

Local quadratic approximation has been extensively used to study the opt...

Accelerating Hessian-free optimization for deep neural networks by implicit preconditioning and sampling

Hessian-free training has become a popular parallel second or- der optim...