Design of High-Throughput Mixed-Precision CNN Accelerators on FPGA

08/09/2022
by   Cecilia Latotzke, et al.
0

Convolutional Neural Networks (CNNs) reach high accuracies in various application domains, but require large amounts of computation and incur costly data movements. One method to decrease these costs while trading accuracy is weight and/or activation word-length reduction. Thereby, layer-wise mixed-precision quantization allows for more efficient results while inflating the design space. In this work, we present an in-depth quantitative methodology to efficiently explore the design space considering the limited hardware resources of a given FPGA. Our holistic exploration approach vertically traverses the various design entry levels from the architectural down to the logic level, and laterally covers optimization from processing elements to dataflow for an efficient mixed-precision CNN accelerator. Our resulting hardware accelerators implement truly mixed-precision operations that enable efficient execution of layer-wise and channel-wise quantized CNNs. Mapping feed-forward and identity-shortcut-connection mixed-precision CNNs result in competitive accuracy-throughout trade-offs: 245 frames/s with 87.48 accuracy for ResNet-18 and 92.9 ResNet-152, respectively. Thereby, the required memory footprint for parameters is reduced by 4.9x and 9.4x compared to the respective floating-point baseline.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/14/2020

Memory-Efficient Dataflow Inference for Deep CNNs on FPGA

Custom dataflow Convolutional Neural Network (CNN) inference accelerator...
research
09/03/2020

Layer-specific Optimization for Mixed Data Flow with Mixed Precision in FPGA Design for CNN-based Object Detectors

Convolutional neural networks (CNNs) require both intensive computation ...
research
06/12/2018

Exploration of Low Numeric Precision Deep Learning Inference Using Intel FPGAs

CNNs have been shown to maintain reasonable classification accuracy when...
research
06/08/2023

Mixed-TD: Efficient Neural Network Accelerator with Layer-Specific Tensor Decomposition

Neural Network designs are quite diverse, from VGG-style to ResNet-style...
research
09/21/2022

Tree Methods for Hierarchical Classification in Parallel

We propose methods that enable efficient hierarchical classification in ...
research
11/20/2017

Tactics to Directly Map CNN graphs on Embedded FPGAs

Deep Convolutional Neural Networks (CNNs) are the state-of-the-art in im...
research
06/30/2018

The Challenge of Multi-Operand Adders in CNNs on FPGAs: How not to solve it!

Convolutional Neural Networks (CNNs) are computationally intensive algor...

Please sign up or login with your details

Forgot password? Click here to reset