Limitations of neural network training due to numerical instability of backpropagation

10/03/2022
by   Clemens Karner, et al.
0

We study the training of deep neural networks by gradient descent where floating-point arithmetic is used to compute the gradients. In this framework and under realistic assumptions, we demonstrate that it is highly unlikely to find ReLU neural networks that maintain, in the course of training with gradient descent, superlinearly many affine pieces with respect to their number of layers. In virtually all approximation theoretical arguments which yield high order polynomial rates of approximation, sequences of ReLU neural networks with exponentially many affine pieces compared to their numbers of layers are used. As a consequence, we conclude that approximating sequences of ReLU neural networks resulting from gradient descent in practice differ substantially from theoretically constructed sequences. The assumptions and the theoretical results are compared to a numerical study, which yields concurring results.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/30/2019

On the convergence of gradient descent for two layer neural networks

It has been shown that gradient descent can yield the zero training loss...
research
05/27/2019

Fast Convergence of Natural Gradient Descent for Overparameterized Neural Networks

Natural gradient descent has proven effective at mitigating the effects ...
research
08/25/2020

Stochastic Markov Gradient Descent and Training Low-Bit Neural Networks

The massive size of modern neural networks has motivated substantial rec...
research
06/04/2022

Surprising Instabilities in Training Deep Networks and a Theoretical Analysis

We discover restrained numerical instabilities in current training pract...
research
04/06/2021

Proof of the Theory-to-Practice Gap in Deep Learning via Sampling Complexity bounds for Neural Network Approximation Spaces

We study the computational complexity of (deterministic or randomized) a...
research
01/03/2021

Algorithmic Complexities in Backpropagation and Tropical Neural Networks

In this note, we propose a novel technique to reduce the algorithmic com...
research
03/14/2017

Convergence of Deep Neural Networks to a Hierarchical Covariance Matrix Decomposition

We show that in a deep neural network trained with ReLU, the low-lying l...

Please sign up or login with your details

Forgot password? Click here to reset