End-to-End DNN Training with Block Floating Point Arithmetic

04/04/2018
by   Mario Drumond, et al.
0

DNNs are ubiquitous datacenter workloads, requiring orders of magnitude more computing power from servers than traditional workloads. As such, datacenter operators are forced to adopt domain-specific accelerators that employ half-precision floating-point (FP) numeric representations to improve arithmetic density. Unfortunately, even these representations are not dense enough, and are, therefore, sub-optimal for DNNs. We propose a hybrid approach that employs dense block floating-point (BFP) arithmetic on dot product computations and FP arithmetic elsewhere. While using BFP improves the performance of dot product operations, that compose most of DNN computations, allowing values to freely float between dot product operations leads to a better choice of tensor exponents when converting values to back BFP. We show that models trained with hybrid BFP-FP arithmetic either match or outperform their FP32 counterparts, leading to more compact models and denser arithmetic in computing platforms.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/04/2018

Training DNNs with Hybrid Block Floating Point

The wide adoption of DNNs has given birth to unrelenting computing requi...
research
11/19/2022

Accuracy Boosters: Epoch-Driven Mixed-Mantissa Block Floating-Point for DNN Training

The unprecedented growth in DNN model complexity, size and the amount of...
research
11/22/2022

Representations of the symmetric group are decomposable in polynomial time

We introduce an algorithm to decompose orthogonal matrix representations...
research
05/11/2023

Big-PERCIVAL: Exploring the Native Use of 64-Bit Posit Arithmetic in Scientific Computing

The accuracy requirements in many scientific computing workloads result ...
research
01/28/2018

BOPS, Not FLOPS! A New Metric and Roofline Performance Model For Datacenter Computing

The past decades witness FLOPS (Floating-point Operations per Second) as...
research
06/03/2021

When does the Lanczos algorithm compute exactly?

In theory, the Lanczos algorithm generates an orthogonal basis of the co...
research
02/18/2019

ENBB Processor: Towards the ExaScale Numerical Brain Box [Position Paper]

ExaScale systems will be a key driver for simulations that are essential...

Please sign up or login with your details

Forgot password? Click here to reset