Representation range needs for 16-bit neural network training

03/29/2021
by   Valentina Popescu, et al.
0

Deep learning has grown rapidly thanks to its state-of-the-art performance across a wide range of real-world applications. While neural networks have been trained using IEEE-754 binary32 arithmetic, the rapid growth of computational demands in deep learning has boosted interest in faster, low precision training. Mixed-precision training that combines IEEE-754 binary16 with IEEE-754 binary32 has been tried, and other 16-bit formats, for example Google's bfloat16, have become popular. In floating-point arithmetic there is a tradeoff between precision and representation range as the number of exponent bits changes; denormal numbers extend the representation range. This raises questions of how much exponent range is needed, of whether there is a format between binary16 (5 exponent bits) and bfloat16 (8 exponent bits) that works better than either of them, and whether or not denormals are necessary. In the current paper we study the need for denormal numbers for mixed-precision training, and we propose a 1/6/9 format, i.e., 6-bit exponent and 9-bit explicit mantissa, that offers a better range-precision tradeoff. We show that 1/6/9 mixed-precision training is able to speed up training on hardware that incurs a performance slowdown on denormal operations or eliminates the need for denormal numbers altogether. And, for a number of fully connected and convolutional neural networks in computer vision and natural language processing, 1/6/9 achieves numerical parity to standard mixed-precision.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/04/2021

EFloat: Entropy-coded Floating Point Format for Deep Learning

We describe the EFloat floating-point number format with 4 to 6 addition...
research
08/19/2022

FP8 Quantization: The Power of the Exponent

When quantizing neural networks for efficient inference, low-bit integer...
research
04/10/2021

Fixed-Posit: A Floating-Point Representation for Error-Resilient Applications

Today, almost all computer systems use IEEE-754 floating point to repres...
research
04/12/2019

Leveraging the bfloat16 Artificial Intelligence Datatype For Higher-Precision Computations

In recent years fused-multiply-add (FMA) units with lower-precision mult...
research
09/12/2022

FP8 Formats for Deep Learning

FP8 is a natural progression for accelerating deep learning training inf...
research
11/18/2019

Distributed Low Precision Training Without Mixed Precision

Low precision training is one of the most popular strategies for deployi...
research
12/16/2021

On the accuracy and performance of the lattice Boltzmann method with 64-bit, 32-bit and novel 16-bit number formats

Fluid dynamics simulations with the lattice Boltzmann method (LBM) are v...

Please sign up or login with your details

Forgot password? Click here to reset