Rethinking Floating Point Overheads for Mixed Precision DNN Accelerators

01/27/2021
by   Hamzah Abdel-Aziz, et al.
0

In this paper, we propose a mixed-precision convolution unit architecture which supports different integer and floating point (FP) precisions. The proposed architecture is based on low-bit inner product units and realizes higher precision based on temporal decomposition. We illustrate how to integrate FP computations on integer-based architecture and evaluate overheads incurred by FP arithmetic support. We argue that alignment and addition overhead for FP inner product can be significant since the maximum exponent difference could be up to 58 bits, which results into a large alignment logic. To address this issue, we illustrate empirically that no more than 26-bitproduct bits are required and up to 8-bit of alignment is sufficient in most inference cases. We present novel optimizations based on the above observations to reduce the FP arithmetic hardware overheads. Our empirical results, based on simulation and hardware implementation, show significant reduction in FP16 overhead. Over typical mixed precision implementation, the proposed architecture achieves area improvements of up to 25 up to 46 TFLOPS/Wand up to 63

READ FULL TEXT
research
07/11/2020

HOBFLOPS CNNs: Hardware Optimized Bitsliced Floating-Point Operations Convolutional Neural Networks

Convolutional neural network (CNN) inference is commonly performed with ...
research
06/30/2023

Multigrid Methods using Block Floating Point Arithmetic

Block Floating Point (BFP) arithmetic is currently seeing a resurgence i...
research
02/03/2023

PDPU: An Open-Source Posit Dot-Product Unit for Deep Learning Applications

Posit has been a promising alternative to the IEEE-754 floating point fo...
research
04/11/2022

Multiplier with Reduced Activities and Minimized Interconnect for Inner Product Arrays

We present a pipelined multiplier with reduced activities and minimized ...
research
10/09/2019

QPyTorch: A Low-Precision Arithmetic Simulation Framework

Low-precision training reduces computational cost and produces efficient...
research
07/07/2022

MiniFloat-NN and ExSdotp: An ISA Extension and a Modular Open Hardware Unit for Low-Precision Training on RISC-V cores

Low-precision formats have recently driven major breakthroughs in neural...
research
09/19/2022

SAMP: A Toolkit for Model Inference with Self-Adaptive Mixed-Precision

The latest industrial inference engines, such as FasterTransformer1 and ...

Please sign up or login with your details

Forgot password? Click here to reset