WinoCNN: Kernel Sharing Winograd Systolic Array for Efficient Convolutional Neural Network Acceleration on FPGAs

07/09/2021
by   Xinheng Liu, et al.
8

The combination of Winograd's algorithm and systolic array architecture has demonstrated the capability of improving DSP efficiency in accelerating convolutional neural networks (CNNs) on FPGA platforms. However, handling arbitrary convolution kernel sizes in FPGA-based Winograd processing elements and supporting efficient data access remain underexplored. In this work, we are the first to propose an optimized Winograd processing element (WinoPE), which can naturally support multiple convolution kernel sizes with the same amount of computing resources and maintains high runtime DSP efficiency. Using the proposed WinoPE, we construct a highly efficient systolic array accelerator, termed WinoCNN. We also propose a dedicated memory subsystem to optimize the data access. Based on the accelerator architecture, we build accurate resource and performance modeling to explore optimal accelerator configurations under different resource constraints. We implement our proposed accelerator on multiple FPGAs, which outperforms the state-of-the-art designs in terms of both throughput and DSP efficiency. Our implementation achieves DSP efficiency up to 1.33 GOPS/DSP and throughput up to 3.1 TOPS with the Xilinx ZCU102 FPGA. These are 29.1% and 20.0% better than the best solutions reported previously, respectively.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/20/2017

A GPU-Outperforming FPGA Accelerator Architecture for Binary Convolutional Neural Networks

FPGA-based hardware accelerators for convolutional neural networks (CNNs...
research
09/03/2018

A CNN Accelerator on FPGA Using Depthwise Separable Convolution

Convolutional neural networks (CNNs) have been widely deployed in the fi...
research
02/09/2020

FastWave: Accelerating Autoregressive Convolutional Neural Networks on FPGA

Autoregressive convolutional neural networks (CNNs) have been widely exp...
research
08/28/2022

FFCNN: Fast FPGA based Acceleration for Convolution neural network inference

We present a new efficient OpenCL-based Accelerator for large scale Conv...
research
09/26/2019

Serving Recurrent Neural Networks Efficiently with a Spatial Accelerator

Recurrent Neural Network (RNN) applications form a major class of AI-pow...
research
05/29/2020

A Unified Hardware Architecture for Convolutions and Deconvolutions in CNN

In this paper, a scalable neural network hardware architecture for image...
research
05/08/2018

FlashAbacus: A Self-Governing Flash-Based Accelerator for Low-Power Systems

Energy efficiency and computing flexibility are some of the primary desi...

Please sign up or login with your details

Forgot password? Click here to reset