Leveraging the bfloat16 Artificial Intelligence Datatype For Higher-Precision Computations

by   Greg Henry, et al.

In recent years fused-multiply-add (FMA) units with lower-precision multiplications and higher-precision accumulation have proven useful in machine learning/artificial intelligence applications, most notably in training deep neural networks due to their extreme computational intensity. Compared to classical IEEE-754 32 bit (FP32) and 64 bit (FP64) arithmetic, these reduced precision arithmetic can naturally be sped up disproportional to their shortened width. The common strategy of all major hardware vendors is to aggressively further enhance their performance disproportionately. One particular FMA operation that multiplies two BF16 numbers while accumulating in FP32 has been found useful in deep learning, where BF16 is the 16-bit floating point datatype with IEEE FP32 numerical range but 8 significant bits of precision. In this paper, we examine the use this FMA unit to implement higher-precision matrix routines in terms of potential performance gain and implications on accuracy. We demonstrate how a decomposition into multiple smaller datatypes can be used to assemble a high-precision result, leveraging the higher precision accumulation of the FMA unit. We first demonstrate that computations of vector inner products and by natural extension, matrix-matrix products can be achieved by decomposing FP32 numbers in several BF16 numbers followed by appropriate computations that can accommodate the dynamic range and preserve accuracy compared to standard FP32 computations, while projecting up to 5.2x speed-up. Furthermore, we examine solution of linear equations formulated in the residual form that allows for iterative refinement. We demonstrate that the solution obtained to be comparable to those offered by FP64 under a large range of linear system condition numbers.


Training Deep Neural Networks with 8-bit Floating Point Numbers

The state-of-the-art hardware platforms for training Deep Neural Network...

Customizing Number Representation and Precision

There is a growing interest in the use of reduced-precision arithmetic, ...

Representation range needs for 16-bit neural network training

Deep learning has grown rapidly thanks to its state-of-the-art performan...

An Investigation on Inherent Robustness of Posit Data Representation

As the dimensions and operating voltages of computer electronics shrink ...

Open-Source GEMM Hardware Kernels Generator: Toward Numerically-Tailored Computations

Many scientific computing problems can be reduced to Matrix-Matrix Multi...

Big-PERCIVAL: Exploring the Native Use of 64-Bit Posit Arithmetic in Scientific Computing

The accuracy requirements in many scientific computing workloads result ...

Iterative Methods at Lower Precision

Since numbers in the computer are represented with a fixed number of bit...

Please sign up or login with your details

Forgot password? Click here to reset