Accumulation Bit-Width Scaling For Ultra-Low Precision Training Of Deep Networks

01/19/2019
by   Charbel Sakr, et al.
0

Efforts to reduce the numerical precision of computations in deep learning training have yielded systems that aggressively quantize weights and activations, yet employ wide high-precision accumulators for partial sums in inner-product operations to preserve the quality of convergence. The absence of any framework to analyze the precision requirements of partial sum accumulations results in conservative design choices. This imposes an upper-bound on the reduction of complexity of multiply-accumulate units. We present a statistical approach to analyze the impact of reduced accumulation precision on deep learning training. Observing that a bad choice for accumulation precision results in loss of information that manifests itself as a reduction in variance in an ensemble of partial sums, we derive a set of equations that relate this variance to the length of accumulation and the minimum number of bits needed for accumulation. We apply our analysis to three benchmark networks: CIFAR-10 ResNet 32, ImageNet ResNet 18 and ImageNet AlexNet. In each case, with accumulation precision set in accordance with our proposed equations, the networks successfully converge to the single precision floating-point baseline. We also show that reducing accumulation precision further degrades the quality of the trained network, proving that our equations produce tight bounds. Overall this analysis enables precise tailoring of computation hardware to the application, yielding area- and power-optimal systems.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/19/2018

Training Deep Neural Networks with 8-bit Floating Point Numbers

The state-of-the-art hardware platforms for training Deep Neural Network...
research
09/11/2018

Discovering Low-Precision Networks Close to Full-Precision Networks for Efficient Embedded Inference

To realize the promise of ubiquitous embedded deep network inference, it...
research
01/25/2018

Investigating the Effects of Dynamic Precision Scaling on Neural Network Training

Training neural networks is a time- and compute-intensive operation. Thi...
research
04/19/2018

Minimizing Area and Energy of Deep Learning Hardware Design Using Collective Low Precision and Structured Compression

Deep learning algorithms have shown tremendous success in many recogniti...
research
06/20/2017

Improving text classification with vectors of reduced precision

This paper presents the analysis of the impact of a floating-point numbe...
research
10/10/2017

Mixed Precision Training

Deep neural networks have enabled progress in a wide variety of applicat...
research
06/04/2022

Surprising Instabilities in Training Deep Networks and a Theoretical Analysis

We discover restrained numerical instabilities in current training pract...

Please sign up or login with your details

Forgot password? Click here to reset