BRAMAC: Compute-in-BRAM Architectures for Multiply-Accumulate on FPGAs

04/08/2023
by   Yuzong Chen, et al.
0

Deep neural network (DNN) inference using reduced integer precision has been shown to achieve significant improvements in memory utilization and compute throughput with little or no accuracy loss compared to full-precision floating-point. Modern FPGA-based DNN inference relies heavily on the on-chip block RAM (BRAM) for model storage and the digital signal processing (DSP) unit for implementing the multiply-accumulate (MAC) operation, a fundamental DNN primitive. In this paper, we enhance the existing BRAM to also compute MAC by proposing BRAMAC (Compute-in-BRAM Architectures for Multiply-Accumulate). BRAMAC supports 2's complement 2- to 8-bit MAC in a small dummy BRAM array using a hybrid bit-serial bit-parallel data flow. Unlike previous compute-in-BRAM architectures, BRAMAC allows read/write access to the main BRAM array while computing in the dummy BRAM array, enabling both persistent and tiling-based DNN inference. We explore two BRAMAC variants: BRAMAC-2SA (with 2 synchronous dummy arrays) and BRAMAC-1DA (with 1 double-pumped dummy array). BRAMAC-2SA/BRAMAC-1DA can boost the peak MAC throughput of a large Arria-10 FPGA by 2.6×/2.1×, 2.3×/2.0×, and 1.9×/1.7× for 2-bit, 4-bit, and 8-bit precisions, respectively at the cost of 6.8 BRAMAC-2SA/BRAMAC-1DA to a state-of-the-art tiling-based DNN accelerator, an average speedup of 2.05×/1.7× and 1.33×/1.52× can be achieved for AlexNet and ResNet-34, respectively across different model precisions.

READ FULL TEXT

page 1

page 9

research
07/30/2019

Deep Learning Training on the Edge with Low-Precision Posits

Recently, the posit numerical format has shown promise for DNN data repr...
research
03/02/2020

A New MRAM-based Process In-Memory Accelerator for Efficient Neural Network Training with Floating Point Precision

The excellent performance of modern deep neural networks (DNNs) comes at...
research
03/23/2022

CoMeFa: Compute-in-Memory Blocks for FPGAs

Block RAMs (BRAMs) are the storage houses of FPGAs, providing extensive ...
research
02/07/2021

CrossStack: A 3-D Reconfigurable RRAM Crossbar Inference Engine

Deep neural network inference accelerators are rapidly growing in import...
research
05/15/2023

Marsellus: A Heterogeneous RISC-V AI-IoT End-Node SoC with 2-to-8b DNN Acceleration and 30

Emerging Artificial Intelligence-enabled Internet-of-Things (AI-IoT) Sys...
research
03/13/2018

An FPGA-Based Hardware Accelerator for Energy-Efficient Bitmap Index Creation

Bitmap index is recognized as a promising candidate for online analytics...
research
06/21/2016

An Area-Efficient FPGA Overlay using DSP Block based Time-multiplexed Functional Units

Coarse grained overlay architectures improve FPGA design productivity by...

Please sign up or login with your details

Forgot password? Click here to reset