LCA: Loss Change Allocation for Neural Network Training

09/03/2019
by   Janice Lan, et al.
6

Neural networks enjoy widespread use, but many aspects of their training, representation, and operation are poorly understood. In particular, our view into the training process is limited, with a single scalar loss being the most common viewport into this high-dimensional, dynamic process. We propose a new window into training called Loss Change Allocation (LCA), in which credit for changes to the network loss is conservatively partitioned to the parameters. This measurement is accomplished by decomposing the components of an approximate path integral along the training trajectory using a Runge-Kutta integrator. This rich view shows which parameters are responsible for decreasing or increasing the loss during training, or which parameters "help" or "hurt" the network's learning, respectively. LCA may be summed over training iterations and/or over neurons, channels, or layers for increasingly coarse views. This new measurement device produces several insights into training. (1) We find that barely over 50 Some entire layers hurt overall, moving on average against the training gradient, a phenomenon we hypothesize may be due to phase lag in an oscillatory training process. (3) Finally, increments in learning proceed in a synchronized manner across layers, often peaking on identical iterations.

READ FULL TEXT

page 13

page 14

page 17

page 22

research
03/20/2019

Representative Datasets: The Perceptron Case

One of the main drawbacks of the practical use of neural networks is the...
research
02/24/2020

The Early Phase of Neural Network Training

Recent studies have shown that many important aspects of neural network ...
research
07/05/2022

Understanding and Improving Group Normalization

Various normalization layers have been proposed to help the training of ...
research
04/04/2023

Physics-aware Roughness Optimization for Diffractive Optical Neural Networks

As a representative next-generation device/circuit technology beyond CMO...
research
05/30/2022

Superposing Many Tickets into One: A Performance Booster for Sparse Neural Network Training

Recent works on sparse neural network training (sparse training) have sh...
research
06/21/2020

Applications of Koopman Mode Analysis to Neural Networks

We consider the training process of a neural network as a dynamical syst...
research
02/12/2021

Cockpit: A Practical Debugging Tool for Training Deep Neural Networks

When engineers train deep learning models, they are very much "flying bl...

Please sign up or login with your details

Forgot password? Click here to reset