ShortcutFusion: From Tensorflow to FPGA-based accelerator with reuse-aware memory allocation for shortcut data

06/15/2021
by   Duy Thanh Nguyen, et al.
0

Residual block is a very common component in recent state-of-the art CNNs such as EfficientNet or EfficientDet. Shortcut data accounts for nearly 40 feature-maps access in ResNet152 [8]. Most of the previous DNN compilers, accelerators ignore the shortcut data optimization. This paper presents ShortcutFusion, an optimization tool for FPGA-based accelerator with a reuse-aware static memory allocation for shortcut data, to maximize on-chip data reuse given resource constraints. From TensorFlow DNN models, the proposed design generates instruction sets for a group of nodes which uses an optimized data reuse for each residual block. The accelerator design implemented on the Xilinx KCU1500 FPGA card significantly outperforms NVIDIA RTX 2080 Ti, Titan Xp, and GTX 1080 Ti for the EfficientNet inference. Compared to RTX 2080 Ti, the proposed design is 1.35-2.33x faster and 6.7-7.9x more power efficient. Compared to the result from baseline, in which the weights, inputs, and outputs are accessed from the off-chip memory exactly once per each layer, ShortcutFusion reduces the DRAM access by 47.8-84.8 ResNet152, and EfficientNet. Given a similar buffer size to ShortcutMining [8], which also mine the shortcut data in hardware, the proposed work reduces off-chip access for feature-maps 5.27x while accessing weight from off-chip memory exactly once.

READ FULL TEXT
POST COMMENT

Comments

There are no comments yet.

Authors

page 3

10/12/2021

Memory-Efficient CNN Accelerator Based on Interlayer Feature Map Compression

Existing deep convolutional neural networks (CNNs) generate massive inte...
12/01/2018

DeCoILFNet: Depth Concatenation and Inter-Layer Fusion based ConvNet Accelerator

Convolutional Neural Networks (CNNs) are rapidly gaining popularity in v...
05/19/2021

Block Convolution: Towards Memory-Efficient Inference of Large-Scale CNNs on FPGA

Deep convolutional neural networks have achieved remarkable progress in ...
09/04/2020

ConfuciuX: Autonomous Hardware Resource Assignment for DNN Accelerators using Reinforcement Learning

DNN accelerators provide efficiency by leveraging reuse of activations/w...
09/03/2020

Layer-specific Optimization for Mixed Data Flow with Mixed Precision in FPGA Design for CNN-based Object Detectors

Convolutional neural networks (CNNs) require both intensive computation ...
06/27/2021

OCCAM: Optimal Data Reuse for Convolutional Neural Networks

Convolutional neural networks (CNNs) are emerging as powerful tools for ...
01/18/2018

On-Chip CNN Accelerator for Image Super-Resolution

To implement convolutional neural networks (CNN) in hardware, the state-...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.