Mixed Precision Training With 8-bit Floating Point

05/29/2019
by   Naveen Mellempudi, et al.
0

Reduced precision computation for deep neural networks is one of the key areas addressing the widening compute gap driven by an exponential growth in model size. In recent years, deep learning training has largely migrated to 16-bit precision, with significant gains in performance and energy efficiency. However, attempts to train DNNs at 8-bit precision have met with significant challenges because of the higher precision and dynamic range requirements of back-propagation. In this paper, we propose a method to train deep neural networks using 8-bit floating point representation for weights, activations, errors, and gradients. In addition to reducing compute precision, we also reduced the precision requirements for the master copy of weights from 32-bit to 16-bit. We demonstrate state-of-the-art accuracy across multiple data sets (imagenet-1K, WMT16) and a broader set of workloads (Resnet-18/34/50, GNMT, Transformer) than previously reported. We propose an enhanced loss scaling method to augment the reduced subnormal range of 8-bit floating point for improved error propagation. We also examine the impact of quantization noise on generalization and propose a stochastic rounding technique to address gradient noise. As a result of applying all these techniques, we report slightly higher validation accuracy compared to full precision baseline.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/19/2018

Training Deep Neural Networks with 8-bit Floating Point Numbers

The state-of-the-art hardware platforms for training Deep Neural Network...
research
10/10/2017

Mixed Precision Training

Deep neural networks have enabled progress in a wide variety of applicat...
research
05/03/2018

Exploration of Numerical Precision in Deep Neural Networks

Reduced numerical precision is a common technique to reduce computationa...
research
11/17/2015

Reduced-Precision Strategies for Bounded Memory in Deep Neural Nets

This work investigates how using reduced precision data in Convolutional...
research
10/02/2016

Accelerating Deep Convolutional Networks using low-precision and sparsity

We explore techniques to significantly improve the compute efficiency an...
research
06/15/2020

Neural gradients are lognormally distributed: understanding sparse and quantized training

Neural gradient compression remains a main bottleneck in improving train...
research
01/25/2018

Investigating the Effects of Dynamic Precision Scaling on Neural Network Training

Training neural networks is a time- and compute-intensive operation. Thi...

Please sign up or login with your details

Forgot password? Click here to reset