On the RTL Implementation of FINN Matrix Vector Compute Unit

01/27/2022
by   Syed Asad Alam, et al.
0

FPGA-based accelerators are becoming more popular for deep neural network due to the ability to scale performance with increasing degree of specialization with dataflow architectures or custom data types. To reduce the barrier for software engineers and data scientists to adopt FPGAs, C++- and OpenCL-based design entries with high-level synthesis (HLS) have been introduced. They provide higher abstraction compared to register-transfer level (RTL)-based design. HLS offers faster development time, better maintainability and more flexibility in code exploration, when evaluating options for multi-dimension tensors, convolutional layers or parallelism. Thus, HLS has been adopted by DNN accelerator generation frameworks such as FINN and hls4ml. In this paper, we present an alternative backend RTL library for FINN. We investigate and evaluate, across a spectrum of design dimensions, an RTL-based implementation versus the original HLS variant. We show that for smaller design parameters, RTL produces significantly smaller circuits. For larger circuits, however, the look-up table (LUT) count of RTL-based design is slightly higher, up to around 15%. On the other hand, HLS consistently requires more flip-flops (FFs) (orders-of-magnitude increase) and block RAMs (BRAMs) (2× more). This also impacts the critical path delay, with RTL producing significantly faster circuits, up to 80%. Furthermore, RTL also benefits from at-least a 10× reduction in synthesis time. Finally the results were practically validated using a real-world use case of a multi-layer perceptron (MLP) network used in network intrusion detection. Overall, since HLS frameworks code-generate the hardware design, the benefits of the ease in the design entry is less important as compared to synthesis time reduction togther with resource benefits, this might make the RTL abstraction an attractive alternative.

READ FULL TEXT

page 13

page 16

research
10/01/2021

SECDA: Efficient Hardware/Software Co-Design of FPGA-based DNN Accelerators for Edge Inference

Edge computing devices inherently face tight resource constraints, which...
research
03/24/2021

De-specializing an HLS library for Deep Neural Networks: improvements upon hls4ml

Custom hardware accelerators for Deep Neural Networks are increasingly p...
research
03/05/2019

Module-per-Object: a Human-Driven Methodology for C++-based High-Level Synthesis Design

High-Level Synthesis (HLS) brings FPGAs to audiences previously unfamili...
research
10/31/2016

ARAPrototyper: Enabling Rapid Prototyping and Evaluation for Accelerator-Rich Architectures

Compared to conventional general-purpose processors, accelerator-rich ar...
research
12/05/2017

Bit Fusion: Bit-Level Dynamically Composable Architecture for Accelerating Deep Neural Networks

Hardware acceleration of Deep Neural Networks (DNNs) aims to tame their ...
research
12/24/2016

Application-aware Retiming of Accelerators: A High-level Data-driven Approach

Flexibility at hardware level is the main driving force behind adaptive ...
research
03/10/2018

Efficient FPGA Implementation of Conjugate Gradient Methods for Laplacian System using HLS

In this paper, we study FPGA based pipelined and superscalar design of t...

Please sign up or login with your details

Forgot password? Click here to reset