Reduced-Precision Floating-Point Arithmetic in Systolic Arrays with Skewed Pipelines

04/04/2023
by   Dionysios Filippas, et al.
0

The acceleration of deep-learning kernels in hardware relies on matrix multiplications that are executed efficiently on Systolic Arrays (SA). To effectively trade off deep-learning training/inference quality with hardware cost, SA accelerators employ reduced-precision Floating-Point (FP) arithmetic. In this work, we demonstrate the need for new pipeline organizations to reduce latency and improve energy efficiency of reduced-precision FP operators for the chained multiply-add operation imposed by the structure of the SA. The proposed skewed pipeline design reorganizes the pipelined operation of the FP multiply-add units to enable new forwarding paths for the exponent logic, which allow for parallel execution of the pipeline stages of consecutive PEs. As a result, the latency of the matrix multiplication operation within the SA is significantly reduced with minimal hardware cost, thereby yielding an energy reduction of 8

READ FULL TEXT
research
01/14/2019

Faster arbitrary-precision dot product and matrix multiplication

We present algorithms for real and complex dot product and matrix multip...
research
12/08/2022

Customizing Number Representation and Precision

There is a growing interest in the use of reduced-precision arithmetic, ...
research
06/24/2016

FPMax: a 106GFLOPS/W at 217GFLOPS/mm2 Single-Precision FPU, and a 43.7GFLOPS/W at 74.6GFLOPS/mm2 Double-Precision FPU, in 28nm UTBB FDSOI

FPMax implements four FPUs optimized for latency or throughput workloads...
research
04/03/2023

Monotonicity of Multi-Term Floating-Point Adders

In the literature on algorithms for performing the multi-term addition s...
research
07/26/2022

Productivity meets Performance: Julia on A64FX

The Fujitsu A64FX ARM-based processor is used in supercomputers such as ...
research
02/03/2023

PDPU: An Open-Source Posit Dot-Product Unit for Deep Learning Applications

Posit has been a promising alternative to the IEEE-754 floating point fo...
research
09/22/2020

A reduced-precision streaming SpMV architecture for Personalized PageRank on FPGA

Sparse matrix-vector multiplication is often employed in many data-analy...

Please sign up or login with your details

Forgot password? Click here to reset