Surprising Instabilities in Training Deep Networks and a Theoretical Analysis

06/04/2022
by   Yuxin Sun, et al.
0

We discover restrained numerical instabilities in current training practices of deep networks with SGD. We show numerical error (on the order of the smallest floating point bit) induced from floating point arithmetic in training deep nets can be amplified significantly and result in significant test accuracy variance, comparable to the test accuracy variance due to stochasticity in SGD. We show how this is likely traced to instabilities of the optimization dynamics that are restrained, i.e., localized over iterations and regions of the weight tensor space. We do this by presenting a theoretical framework using numerical analysis of partial differential equations (PDE), and analyzing the gradient descent PDE of a simplified convolutional neural network (CNN). We show that it is stable only under certain conditions on the learning rate and weight decay. We reproduce the localized instabilities in the PDE for the simplified network, which arise when the conditions are violated.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/21/2023

Deep Learning via Neural Energy Descent

This paper proposes the Nerual Energy Descent (NED) via neural network e...
research
11/06/2017

Flexpoint: An Adaptive Numerical Format for Efficient Training of Deep Neural Networks

Deep neural networks are commonly developed and trained in 32-bit floati...
research
10/03/2022

Limitations of neural network training due to numerical instability of backpropagation

We study the training of deep neural networks by gradient descent where ...
research
02/09/2022

Deep Neural Networks to Correct Sub-Precision Errors in CFD

Loss of information in numerical simulations can arise from various sour...
research
04/25/2019

Stochastic rounding and reduced-precision fixed-point arithmetic for solving neural ordinary differential equations

Although double-precision floating-point arithmetic currently dominates ...
research
01/19/2019

Accumulation Bit-Width Scaling For Ultra-Low Precision Training Of Deep Networks

Efforts to reduce the numerical precision of computations in deep learni...
research
04/05/2023

Modeling still matters: a surprising instance of catastrophic floating point errors in mathematical biology and numerical methods for ODEs

We guide the reader on a journey through mathematical modeling and numer...

Please sign up or login with your details

Forgot password? Click here to reset