On the Convergence of Stochastic Gradient Descent in Low-precision Number Formats

01/04/2023
by   Matteo Cacciola, et al.
0

Deep learning models are dominating almost all artificial intelligence tasks such as vision, text, and speech processing. Stochastic Gradient Descent (SGD) is the main tool for training such models, where the computations are usually performed in single-precision floating-point number format. The convergence of single-precision SGD is normally aligned with the theoretical results of real numbers since they exhibit negligible error. However, the numerical error increases when the computations are performed in low-precision number formats. This provides compelling reasons to study the SGD convergence adapted for low-precision computations. We present both deterministic and stochastic analysis of the SGD algorithm, obtaining bounds that show the effect of number format. Such bounds can provide guidelines as to how SGD convergence is affected when constraints render the possibility of performing high-precision computations remote.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/14/2021

Non Asymptotic Bounds for Optimization via Online Multiplicative Stochastic Gradient Descent

The gradient noise of Stochastic Gradient Descent (SGD) is considered to...
research
11/09/2022

Variants of SGD for Lipschitz Continuous Loss Functions in Low-Precision Environments

Motivated by neural network training in low-bit floating and fixed-point...
research
07/03/2016

Understanding the Energy and Precision Requirements for Online Learning

It is well-known that the precision of data, hyperparameters, and intern...
research
02/24/2022

On the influence of roundoff errors on the convergence of the gradient descent method with low-precision floating-point computation

The employment of stochastic rounding schemes helps prevent stagnation o...
research
06/20/2022

Low-Precision Stochastic Gradient Langevin Dynamics

While low-precision optimization has been widely used to accelerate deep...
research
04/26/2019

SWALP : Stochastic Weight Averaging in Low-Precision Training

Low precision operations can provide scalability, memory savings, portab...
research
04/13/2017

Fully Distributed and Asynchronized Stochastic Gradient Descent for Networked Systems

This paper considers a general data-fitting problem over a networked sys...

Please sign up or login with your details

Forgot password? Click here to reset