Optimizing Bit-Serial Matrix Multiplication for Reconfigurable Computing

01/02/2019
by   Yaman Umuroglu, et al.
0

Matrix-matrix multiplication is a key computational kernel for numerous applications in science and engineering, with ample parallelism and data locality that lends itself well to high-performance implementations. Many matrix multiplication-dependent applications can use reduced-precision integer or fixed-point representations to increase their performance and energy efficiency while still offering adequate quality of results. However, precision requirements may vary between different application phases or depend on input data, rendering constant-precision solutions ineffective. BISMO, a vectorized bit-serial matrix multiplication overlay for reconfigurable computing, previously utilized the excellent binary-operation performance of FPGAs to offer a matrix multiplication performance that scales with required precision and parallelism. We show how BISMO can be scaled up on Xilinx FPGAs using an arithmetic architecture that better utilizes 6-LUTs. The improved BISMO achieves a peak performance of 15.4 binary TOPS on the Ultra96 board with a Xilinx UltraScale+ MPSoC.

READ FULL TEXT

page 5

page 10

research
06/22/2018

BISMO: A Scalable Bit-Serial Matrix Multiplication Overlay for Reconfigurable Computing

Matrix-matrix multiplication is a key computational kernel for numerous ...
research
09/21/2022

POAS: A high-performance scheduling framework for exploiting Accelerator Level Parallelism

Heterogeneous computing is becoming mainstream in all scopes. This new e...
research
09/21/2023

AIM: Accelerating Arbitrary-precision Integer Multiplication on Heterogeneous Reconfigurable Computing Platform Versal ACAP

Arbitrary-precision integer multiplication is the core kernel of many ap...
research
10/01/2020

BCNN: A Binary CNN with All Matrix Ops Quantized to 1 Bit Precision

This paper describes a CNN where all CNN style 2D convolution operations...
research
07/22/2019

Reconfigurable multiplier architecture based on memristor-cmos with higher flexibility

Multiplication is an indispensable operation in most of digital signal p...
research
07/16/2023

An Empirical Evaluation of AriDeM using Matrix Multiplication

For a long time, the Von Neumann has been a successful model of computat...
research
08/15/2022

LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale

Large language models have been widely adopted but require significant G...

Please sign up or login with your details

Forgot password? Click here to reset