On the influence of roundoff errors on the convergence of the gradient descent method with low-precision floating-point computation

02/24/2022
by   Lu Xia, et al.
0

The employment of stochastic rounding schemes helps prevent stagnation of convergence, due to vanishing gradient effect when implementing the gradient descent method in low precision. Conventional stochastic rounding achieves zero bias by preserving small updates with probabilities proportional to their relative magnitudes. In this study, we propose a new stochastic rounding scheme that trades the zero bias property with a larger probability to preserve small gradients. Our method yields a constant rounding bias that, at each iteration, lies in a descent direction. For convex problems, we prove that the proposed rounding method has a beneficial effect on the convergence rate of gradient descent. We validate our theoretical analysis by comparing the performances of various rounding schemes when optimizing a multinomial logistic regression model and when training a simple neural network with 8-bit floating-point format.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/23/2023

On the Convergence of the Gradient Descent Method with Stochastic Fixed-point Rounding Errors under the Polyak-Lojasiewicz Inequality

When training neural networks with low-precision computation, rounding e...
research
01/04/2023

On the Convergence of Stochastic Gradient Descent in Low-precision Number Formats

Deep learning models are dominating almost all artificial intelligence t...
research
12/10/2018

Theoretical Analysis of Auto Rate-Tuning by Batch Normalization

Batch Normalization (BN) has become a cornerstone of deep learning acros...
research
04/05/2022

Gradient Descent Bit-Flipping Decoding with Momentum

In this paper, we propose a Gradient Descent Bit-Flipping (GDBF) decodin...
research
06/23/2019

Efficient Implementation of Second-Order Stochastic Approximation Algorithms in High-Dimensional Problems

Stochastic approximation (SA) algorithms have been widely applied in min...
research
02/16/2021

IntSGD: Floatless Compression of Stochastic Gradients

We propose a family of lossy integer compressions for Stochastic Gradien...
research
03/13/2020

Boosting Frank-Wolfe by Chasing Gradients

The Frank-Wolfe algorithm has become a popular first-order optimization ...

Please sign up or login with your details

Forgot password? Click here to reset