Systolic-CNN: An OpenCL-defined Scalable Run-time-flexible FPGA Accelerator Architecture for Accelerating Convolutional Neural Network Inference in Cloud/Edge Computing

12/06/2020
by   Akshay Dua, et al.
0

This paper presents Systolic-CNN, an OpenCL-defined scalable, run-time-flexible FPGA accelerator architecture, optimized for accelerating the inference of various convolutional neural networks (CNNs) in multi-tenancy cloud/edge computing. The existing OpenCL-defined FPGA accelerators for CNN inference are insufficient due to limited flexibility for supporting multiple CNN models at run time and poor scalability resulting in underutilized FPGA resources and limited computational parallelism. Systolic-CNN adopts a highly pipelined and paralleled 1-D systolic array architecture, which efficiently explores both spatial and temporal parallelism for accelerating CNN inference on FPGAs. Systolic-CNN is highly scalable and parameterized, which can be easily adapted by users to achieve up to 100 computation resources (i.e., DSP blocks) for a given FPGA. Systolic-CNN is also run-time-flexible in the context of multi-tenancy cloud/edge computing, which can be time-shared to accelerate a variety of CNN models at run time without the need of recompiling the FPGA kernel hardware nor reprogramming the FPGA. The experiment results based on an Intel Arria/Stratix 10 GX FPGA Development board show that the optimized single-precision implementation of Systolic-CNN can achieve an average inference latency of 7ms/2ms, 84ms/33ms, 202ms/73ms, 1615ms/873ms, and 900ms/498ms per image for accelerating AlexNet, ResNet-50, ResNet-152, RetinaNet, and Light-weight RetinaNet, respectively. Codes are available at https://github.com/PSCLab-ASU/Systolic-CNN.

READ FULL TEXT
research
01/30/2021

A Competitive Edge: Can FPGAs Beat GPUs at DCNN Inference Acceleration in Resource-Limited Edge Computing Applications?

When trained as generative models, Deep Learning algorithms have shown e...
research
03/26/2020

Enabling Efficient and Flexible FPGA Virtualization for Deep Learning in the Cloud

FPGAs have shown great potential in providing low-latency and energy-eff...
research
09/22/2022

Optimization of FPGA-based CNN Accelerators Using Metaheuristics

In recent years, convolutional neural networks (CNNs) have demonstrated ...
research
12/31/2022

BARVINN: Arbitrary Precision DNN Accelerator Controlled by a RISC-V CPU

We present a DNN accelerator that allows inference at arbitrary precisio...
research
06/09/2022

An FPGA-based Solution for Convolution Operation Acceleration

Hardware-based acceleration is an extensive attempt to facilitate many c...
research
07/18/2020

DeepDive: An Integrative Algorithm/Architecture Co-Design for Deep Separable Convolutional Neural Networks

Deep Separable Convolutional Neural Networks (DSCNNs) have become the em...
research
05/09/2023

DietCNN: Multiplication-free Inference for Quantized CNNs

The rising demand for networked embedded systems with machine intelligen...

Please sign up or login with your details

Forgot password? Click here to reset