DeepBurning-MixQ: An Open Source Mixed-Precision Neural Network Accelerator Design Framework for FPGAs

08/22/2023
by   Erjing Luo, et al.
0

Mixed-precision neural networks (MPNNs) that enable the use of just enough data width for a deep learning task promise significant advantages of both inference accuracy and computing overhead. FPGAs with fine-grained reconfiguration capability can adapt the processing with distinct data width and models, and hence, can theoretically unleash the potential of MPNNs. Nevertheless, commodity DPUs on FPGAs mostly emphasize generality and have limited support for MPNNs especially the ones with lower data width. In addition, primitive DSPs in FPGAs usually have much larger data width than that is required by MPNNs and haven't been sufficiently co-explored with MPNNs yet. To this end, we propose an open source MPNN accelerator design framework specifically tailored for FPGAs. In this framework, we have a systematic DSP-packing algorithm to pack multiple lower data width MACs in a single primitive DSP and enable efficient implementation of MPNNs. Meanwhile, we take DSP packing efficiency into consideration with MPNN quantization within a unified neural network architecture search (NAS) framework such that it can be aware of the DSP overhead during quantization and optimize the MPNN performance and accuracy concurrently. Finally, we have the optimized MPNN fine-tuned to a fully pipelined neural network accelerator template based on HLS and make best use of available resources for higher performance. Our experiments reveal the resulting accelerators produced by the proposed framework can achieve overwhelming advantages in terms of performance, resource utilization, and inference accuracy for MPNNs when compared with both handcrafted counterparts and prior hardware-aware neural network accelerators on FPGAs.

READ FULL TEXT
research
06/17/2021

RHNAS: Realizable Hardware and Neural Architecture Search

The rapidly evolving field of Artificial Intelligence necessitates autom...
research
05/20/2022

QADAM: Quantization-Aware DNN Accelerator Modeling for Pareto-Optimality

As the machine learning and systems communities strive to achieve higher...
research
12/15/2021

N3H-Core: Neuron-designed Neural Network Accelerator via FPGA-based Heterogeneous Computing Cores

Accelerating the neural network inference by FPGA has emerged as a popul...
research
06/30/2022

QUIDAM: A Framework for Quantization-Aware DNN Accelerator and Model Co-Exploration

As the machine learning and systems communities strive to achieve higher...
research
09/05/2023

OHQ: On-chip Hardware-aware Quantization

Quantization emerges as one of the most promising approaches for deployi...
research
12/06/2020

Any-Width Networks

Despite remarkable improvements in speed and accuracy, convolutional neu...

Please sign up or login with your details

Forgot password? Click here to reset