Rethinking floating point for deep learning

11/01/2018
by   Jeff Johnson, et al.
0

Reducing hardware overhead of neural networks for faster or lower power inference and training is an active area of research. Uniform quantization using integer multiply-add has been thoroughly investigated, which requires learning many quantization parameters, fine-tuning training or other prerequisites. Little effort is made to improve floating point relative to this baseline; it remains energy inefficient, and word size reduction yields drastic loss in needed dynamic range. We improve floating point to be more energy efficient than equivalent bit width integer hardware on a 28 nm ASIC process while retaining accuracy in 8 bits with a novel hybrid log multiply/linear add, Kulisch accumulation and tapered encodings from Gustafson's posit format. With no network retraining, and drop-in replacement of all math and float32 parameters via round-to-nearest-even only, this open-sourced 8-bit log float is within 0.9 model on ImageNet. Unlike int8 quantization, it is still a general purpose floating point arithmetic, interpretable out-of-the-box. Our 8/38-bit log float multiply-add is synthesized and power profiled at 28 nm at 0.96x the power and 1.12x the area of 8/32-bit integer multiply-add. In 16 bits, our log float multiply-add is 0.59x the power and 0.68x the area of IEEE 754 float16 fused multiply-add, maintaining the same signficand precision and dynamic range, proving useful for training ASICs as well.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/29/2019

AdaptivFloat: A Floating-point based Data Type for Resilient Deep Learning Inference

Conventional hardware-friendly quantization methods, such as fixed-point...
research
04/17/2020

Efficient, arbitrarily high precision hardware logarithmic arithmetic for linear algebra

The logarithmic number system (LNS) is arguably not broadly used due to ...
research
09/17/2019

K-TanH: Hardware Efficient Activations For Deep Learning

We propose K-TanH, a novel, highly accurate, hardware efficient approxim...
research
05/21/2018

Quantizing Convolutional Neural Networks for Low-Power High-Throughput Inference Engines

Deep learning as a means to inferencing has proliferated thanks to its v...
research
02/06/2022

Alpha Blending with No Division Operations

Highly accurate alpha blending can be performed entirely with integer op...
research
05/08/2022

Efficient Representation of Large-Alphabet Probability Distributions

A number of engineering and scientific problems require representing and...
research
06/17/2020

StatAssist GradBoost: A Study on Optimal INT8 Quantization-aware Training from Scratch

This paper studies the scratch training of quantization-aware training (...

Please sign up or login with your details

Forgot password? Click here to reset