A CNN Accelerator on FPGA Using Depthwise Separable Convolution

09/03/2018
by   Lin Bai, et al.
0

Convolutional neural networks (CNNs) have been widely deployed in the fields of computer vision and pattern recognition because of their high accuracy. However, large convolution operations are computing-intensive that often requires a powerful computing platform such as Graphics Processing Unit (GPU). This makes it difficult to apply CNNs to portable devices. The state-of-the-art CNNs, such as MobileNetV2 and Xception, adopt depthwise separable convolution to replace the standard convolution for embedded platforms. That significantly reduces operations and parameters with only limited loss in accuracy. This highly structured model is very suitable for Field-Programmable Gate Array (FPGA) implementation. In this paper, a scalable high performance depthwise separable convolution optimized CNN accelerator is proposed. The accelerator can be fit into an FPGA of different sizes, provided the balancing between hardware resources and processing speed. As an example, MobileNetV2 is implemented on Arria 10 SoC FPGA, and the results show this accelerator can classify each picture from ImageNet in 3.75ms, which is about 266.6 frames per second. This achieves 20x speedup if compared to CPU.

READ FULL TEXT
research
07/09/2021

WinoCNN: Kernel Sharing Winograd Systolic Array for Efficient Convolutional Neural Network Acceleration on FPGAs

The combination of Winograd's algorithm and systolic array architecture ...
research
05/14/2020

ZynqNet: An FPGA-Accelerated Embedded Convolutional Neural Network

Image Understanding is becoming a vital feature in ever more application...
research
11/21/2018

Synetgy: Algorithm-hardware Co-design for ConvNet Accelerators on Embedded FPGAs

Using FPGAs to accelerate ConvNets has attracted significant attention i...
research
12/07/2020

BinArray: A Scalable Hardware Accelerator for Binary Approximated CNNs

Deep Convolutional Neural Networks (CNNs) have become state-of-the art f...
research
07/15/2017

Binarized Convolutional Neural Networks with Separable Filters for Efficient Hardware Acceleration

State-of-the-art convolutional neural networks are enormously costly in ...
research
05/29/2020

A Unified Hardware Architecture for Convolutions and Deconvolutions in CNN

In this paper, a scalable neural network hardware architecture for image...
research
07/27/2021

A Low-Cost Neural ODE with Depthwise Separable Convolution for Edge Domain Adaptation on FPGAs

Although high-performance deep neural networks are in high demand in edg...

Please sign up or login with your details

Forgot password? Click here to reset