SWALP : Stochastic Weight Averaging in Low-Precision Training

04/26/2019
by   Guandao Yang, et al.
10

Low precision operations can provide scalability, memory savings, portability, and energy efficiency. This paper proposes SWALP, an approach to low precision training that averages low-precision SGD iterates with a modified learning rate schedule. SWALP is easy to implement and can match the performance of full-precision SGD even with all numbers quantized down to 8 bits, including the gradient accumulators. Additionally, we show that SWALP converges arbitrarily close to the optimal solution for quadratic objectives, and to a noise ball asymptotically smaller than low precision SGD in strongly convex settings.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/09/2018

High-Accuracy Low-Precision Training

Low-precision computation is often used to lower the time and energy cos...
research
06/20/2022

Low-Precision Stochastic Gradient Langevin Dynamics

While low-precision optimization has been widely used to accelerate deep...
research
03/04/2019

Learning low-precision neural networks without Straight-Through Estimator(STE)

The Straight-Through Estimator (STE) is widely used for back-propagating...
research
01/04/2023

On the Convergence of Stochastic Gradient Descent in Low-precision Number Formats

Deep learning models are dominating almost all artificial intelligence t...
research
07/03/2016

Understanding the Energy and Precision Requirements for Online Learning

It is well-known that the precision of data, hyperparameters, and intern...
research
02/07/2019

A Simple Baseline for Bayesian Uncertainty in Deep Learning

We propose SWA-Gaussian (SWAG), a simple, scalable, and general purpose ...
research
09/11/2018

Discovering Low-Precision Networks Close to Full-Precision Networks for Efficient Embedded Inference

To realize the promise of ubiquitous embedded deep network inference, it...

Please sign up or login with your details

Forgot password? Click here to reset