Hardware-Efficient Template-Based Deep CNNs Accelerator Design

07/21/2022
by   Azzam Alhussain, et al.
0

Acceleration of Convolutional Neural Network (CNN) on edge devices has recently achieved a remarkable performance in image classification and object detection applications. This paper proposes an efficient and scalable CNN-based SoC-FPGA accelerator design that takes pre-trained weights with a 16-bit fixed-point quantization and target hardware specification to generate an optimized template capable of achieving higher performance versus resource utilization trade-off. The template analyzed the computational workload, data dependency, and external memory bandwidth and utilized loop tiling transformation along with dataflow modeling to convert convolutional and fully connected layers into vector multiplication between input and output feature maps, which resulted in a single compute unit on-chip. Furthermore, the accelerator was examined among AlexNet, VGG16, and LeNet networks and ran at 200-MHz with a peak performance of 230 GOP/s depending on ZYNQ boards and state-space exploration of different compute unit configurations during simulation and synthesis. Lastly, our proposed methodology was benchmarked against the previous development on Ultra96 for higher performance measurement.

READ FULL TEXT

page 1

page 3

research
04/27/2020

A scalable and efficient convolutional neural network accelerator using HLS for a System on Chip design

This paper presents a configurable Convolutional Neural Network Accelera...
research
11/15/2019

TinyCNN: A Tiny Modular CNN Accelerator for Embedded FPGA

In recent years, Convolutional Neural Network (CNN) based methods have a...
research
10/12/2021

Memory-Efficient CNN Accelerator Based on Interlayer Feature Map Compression

Existing deep convolutional neural networks (CNNs) generate massive inte...
research
07/28/2021

SPOTS: An Accelerator for Sparse Convolutional Networks Leveraging Systolic General Matrix-Matrix Multiplication

This paper proposes a new hardware accelerator for sparse convolutional ...
research
01/18/2022

Hardware-Efficient Deconvolution-Based GAN for Edge Computing

Generative Adversarial Networks (GAN) are cutting-edge algorithms for ge...
research
12/16/2019

A flexible FPGA accelerator for convolutional neural networks

Though CNNs are highly parallel workloads, in the absence of efficient o...
research
05/28/2019

CompactNet: Platform-Aware Automatic Optimization for Convolutional Neural Networks

Convolutional Neural Network (CNN) based Deep Learning (DL) has achieved...

Please sign up or login with your details

Forgot password? Click here to reset