Layer-specific Optimization for Mixed Data Flow with Mixed Precision in FPGA Design for CNN-based Object Detectors

09/03/2020
by   Duy Thanh Nguyen, et al.
13

Convolutional neural networks (CNNs) require both intensive computation and frequent memory access, which lead to a low processing speed and large power dissipation. Although the characteristics of the different layers in a CNN are frequently quite different, previous hardware designs have employed common optimization schemes for them. This paper proposes a layer-specific design that employs different organizations that are optimized for the different layers. The proposed design employs two layer-specific optimizations: layer-specific mixed data flow and layer-specific mixed precision. The mixed data flow aims to minimize the off-chip access while demanding a minimal on-chip memory (BRAM) resource of an FPGA device. The mixed precision quantization is to achieve both a lossless accuracy and an aggressive model compression, thereby further reducing the off-chip access. A Bayesian optimization approach is used to select the best sparsity for each layer, achieving the best trade-off between the accuracy and compression. This mixing scheme allows the entire network model to be stored in BRAMs of the FPGA to aggressively reduce the off-chip access, and thereby achieves a significant performance enhancement. The model size is reduced by 22.66-28.93 times compared to that in a full-precision network with a negligible degradation of accuracy on VOC, COCO, and ImageNet datasets. Furthermore, the combination of mixed dataflow and mixed precision significantly outperforms the previous works in terms of both throughput, off-chip access, and on-chip memory requirement.

READ FULL TEXT
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 14

10/12/2021

Memory-Efficient CNN Accelerator Based on Interlayer Feature Map Compression

Existing deep convolutional neural networks (CNNs) generate massive inte...
01/11/2019

Low Precision Constant Parameter CNN on FPGA

We report FPGA implementation results of low precision CNN convolution l...
06/15/2021

ShortcutFusion: From Tensorflow to FPGA-based accelerator with reuse-aware memory allocation for shortcut data

Residual block is a very common component in recent state-of-the art CNN...
12/22/2020

FracBNN: Accurate and FPGA-Efficient Binary Neural Networks with Fractional Activations

Binary neural networks (BNNs) have 1-bit weights and activations. Such n...
01/17/2019

CodeX: Bit-Flexible Encoding for Streaming-based FPGA Acceleration of DNNs

This paper proposes CodeX, an end-to-end framework that facilitates enco...
12/18/2017

Automated flow for compressing convolution neural networks for efficient edge-computation with FPGA

Deep convolutional neural networks (CNN) based solutions are the current...
04/10/2020

SMART Paths for Latency Reduction in ReRAM Processing-In-Memory Architecture for CNN Inference

This research work proposes a design of an analog ReRAM-based PIM (proce...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.