The Multiscale Structure of Neural Network Loss Functions: The Effect on Optimization and Origin

04/24/2022
by   Chao Ma, et al.
10

Local quadratic approximation has been extensively used to study the optimization of neural network loss functions around the minimum. Though, it usually holds in a very small neighborhood of the minimum, and cannot explain many phenomena observed during the optimization process. In this work, we study the structure of neural network loss functions and its implication on optimization in a region beyond the reach of good quadratic approximation. Numerically, we observe that neural network loss functions possesses a multiscale structure, manifested in two ways: (1) in a neighborhood of minima, the loss mixes a continuum of scales and grows subquadratically, and (2) in a larger region, the loss shows several separate scales clearly. Using the subquadratic growth, we are able to explain the Edge of Stability phenomenon[4] observed for gradient descent (GD) method. Using the separate scales, we explain the working mechanism of learning rate decay by simple examples. Finally, we study the origin of the multiscale structure and propose that the non-uniformity of training data is one of its cause. By constructing a two-layer neural network problem we show that training data with different magnitudes give rise to different scales of the loss function, producing subquadratic growth or multiple separate scales.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/27/2019

Improved Training Speed, Accuracy, and Data Utilization Through Loss Function Optimization

As the complexity of neural network models has grown, it has become incr...
research
12/28/2017

Visualizing the Loss Landscape of Neural Nets

Neural network training relies on our ability to find "good" minimizers ...
research
01/06/2022

Federated Optimization of Smooth Loss Functions

In this work, we study empirical risk minimization (ERM) within a federa...
research
06/25/2022

Asymptotic-Preserving Neural Networks for multiscale hyperbolic models of epidemic spread

When investigating epidemic dynamics through differential models, the pa...
research
01/12/2022

There is a Singularity in the Loss Landscape

Despite the widespread adoption of neural networks, their training dynam...
research
09/21/2021

Graph Neural Networks for Graph Drawing

Graph Drawing techniques have been developed in the last few years with ...
research
05/20/2023

Loss Spike in Training Neural Networks

In this work, we study the mechanism underlying loss spikes observed dur...

Please sign up or login with your details

Forgot password? Click here to reset