High-Accuracy Low-Precision Training

03/09/2018
by   Christopher De Sa, et al.
0

Low-precision computation is often used to lower the time and energy cost of machine learning, and recently hardware accelerators have been developed to support it. Still, it has been used primarily for inference - not training. Previous low-precision training algorithms suffered from a fundamental tradeoff: as the number of bits of precision is lowered, quantization noise is added to the model, which limits statistical accuracy. To address this issue, we describe a simple low-precision stochastic gradient descent variant called HALP. HALP converges at the same theoretical rate as full-precision algorithms despite the noise introduced by using low precision throughout execution. The key idea is to use SVRG to reduce gradient variance, and to combine this with a novel technique called bit centering to reduce quantization error. We show that on the CPU, HALP can run up to 4 × faster than full-precision SVRG and can match its convergence trajectory. We implemented HALP in TensorQuant, and show that it exceeds the validation performance of plain low-precision SGD on two deep learning tasks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/20/2022

Low-Precision Stochastic Gradient Langevin Dynamics

While low-precision optimization has been widely used to accelerate deep...
research
04/26/2019

SWALP : Stochastic Weight Averaging in Low-Precision Training

Low precision operations can provide scalability, memory savings, portab...
research
07/20/2022

Quantized Training of Gradient Boosting Decision Trees

Recent years have witnessed significant success in Gradient Boosting Dec...
research
11/16/2016

The ZipML Framework for Training Models with End-to-End Low Precision: The Cans, the Cannots, and a Little Bit of Deep Learning

Recently there has been significant interest in training machine-learnin...
research
03/04/2019

Learning low-precision neural networks without Straight-Through Estimator(STE)

The Straight-Through Estimator (STE) is widely used for back-propagating...
research
03/08/2019

Accelerating Generalized Linear Models with MLWeaving: A One-Size-Fits-All System for Any-precision Learning

Learning from the data stored in a database is an important function inc...
research
03/08/2019

Accelerating Generalized Linear Models with MLWeaving: A One-Size-Fits-All System for Any-precision Learning (Technical Report)

Learning from the data stored in a database is an important function inc...

Please sign up or login with your details

Forgot password? Click here to reset