ILMPQ : An Intra-Layer Multi-Precision Deep Neural Network Quantization framework for FPGA

10/30/2021
by   Sung-En Chang, et al.
0

This work targets the commonly used FPGA (field-programmable gate array) devices as the hardware platform for DNN edge computing. We focus on DNN quantization as the main model compression technique. The novelty of this work is: We use a quantization method that supports multiple precisions along the intra-layer dimension, while the existing quantization methods apply multi-precision quantization along the inter-layer dimension. The intra-layer multi-precision method can uniform the hardware configurations for different layers to reduce computation overhead and at the same time preserve the model accuracy as the inter-layer approach. Our proposed ILMPQ DNN quantization framework achieves 70.73 Top1 accuracy in ResNet-18 on the ImageNet dataset. We also validate the proposed MSP framework on two FPGA devices i.e., Xilinx XC7Z020 and XC7Z045. We achieve 3.65x speedup in end-to-end inference time on the ImageNet, compared with the fixed-point quantization method.

READ FULL TEXT
research
09/16/2020

MSP: An FPGA-Specific Mixed-Scheme, Multi-Precision Deep Neural Network Quantization Framework

With the tremendous success of deep learning, there exists imminent need...
research
10/30/2021

RMSMP: A Novel Deep Neural Network Quantization Framework with Row-wise Mixed Schemes and Multiple Precisions

This work proposes a novel Deep Neural Network (DNN) quantization framew...
research
04/13/2020

Technical Report: NEMO DNN Quantization for Deployment Model

This technical report aims at defining a formal framework for Deep Neura...
research
08/11/2017

Implementation of the Logistic Map with FPGA using 32 bits fixed point standard

This article presents a design of the logistic map by means of FPGA (Fie...
research
03/26/2019

Robustness of Neural Networks to Parameter Quantization

Quantization, a commonly used technique to reduce the memory footprint o...
research
02/17/2020

Precision Gating: Improving Neural Network Efficiency with Dynamic Dual-Precision Activations

We propose precision gating (PG), an end-to-end trainable dynamic dual-p...
research
06/06/2020

Generative Design of Hardware-aware DNNs

To efficiently run DNNs on the edge/cloud, many new DNN inference accele...

Please sign up or login with your details

Forgot password? Click here to reset