Variants of SGD for Lipschitz Continuous Loss Functions in Low-Precision Environments

11/09/2022
by   Michael R. Metel, et al.
0

Motivated by neural network training in low-bit floating and fixed-point environments, this work studies the convergence of variants of SGD with computational error. Considering a general stochastic Lipschitz continuous loss function, a novel convergence result to a Clarke stationary point is presented assuming that only an approximation of its stochastic gradient can be computed as well as error in computing the SGD step itself. Different variants of SGD are then tested empirically in a variety of low-precision arithmetic environments, with improved test set accuracy achieved compared to SGD for two image recognition tasks.

READ FULL TEXT

page 37

page 42

research
01/04/2023

On the Convergence of Stochastic Gradient Descent in Low-precision Number Formats

Deep learning models are dominating almost all artificial intelligence t...
research
05/21/2021

Escaping Saddle Points with Compressed SGD

Stochastic gradient descent (SGD) is a prevalent optimization technique ...
research
02/10/2020

Semi-Implicit Back Propagation

Neural network has attracted great attention for a long time and many re...
research
10/20/2020

A Flatter Loss for Bias Mitigation in Cross-dataset Facial Age Estimation

The most existing studies in the facial age estimation assume training a...
research
06/11/2019

ADASS: Adaptive Sample Selection for Training Acceleration

Stochastic gradient decent (SGD) and its variants, including some accele...
research
06/20/2022

Low-Precision Stochastic Gradient Langevin Dynamics

While low-precision optimization has been widely used to accelerate deep...
research
07/03/2016

Understanding the Energy and Precision Requirements for Online Learning

It is well-known that the precision of data, hyperparameters, and intern...

Please sign up or login with your details

Forgot password? Click here to reset