DeepAI AI Chat
Log In Sign Up

ILMPQ : An Intra-Layer Multi-Precision Deep Neural Network Quantization framework for FPGA

by   Sung-En Chang, et al.
Northeastern University

This work targets the commonly used FPGA (field-programmable gate array) devices as the hardware platform for DNN edge computing. We focus on DNN quantization as the main model compression technique. The novelty of this work is: We use a quantization method that supports multiple precisions along the intra-layer dimension, while the existing quantization methods apply multi-precision quantization along the inter-layer dimension. The intra-layer multi-precision method can uniform the hardware configurations for different layers to reduce computation overhead and at the same time preserve the model accuracy as the inter-layer approach. Our proposed ILMPQ DNN quantization framework achieves 70.73 Top1 accuracy in ResNet-18 on the ImageNet dataset. We also validate the proposed MSP framework on two FPGA devices i.e., Xilinx XC7Z020 and XC7Z045. We achieve 3.65x speedup in end-to-end inference time on the ImageNet, compared with the fixed-point quantization method.


MSP: An FPGA-Specific Mixed-Scheme, Multi-Precision Deep Neural Network Quantization Framework

With the tremendous success of deep learning, there exists imminent need...

RMSMP: A Novel Deep Neural Network Quantization Framework with Row-wise Mixed Schemes and Multiple Precisions

This work proposes a novel Deep Neural Network (DNN) quantization framew...

Technical Report: NEMO DNN Quantization for Deployment Model

This technical report aims at defining a formal framework for Deep Neura...

Implementation of the Logistic Map with FPGA using 32 bits fixed point standard

This article presents a design of the logistic map by means of FPGA (Fie...

Robustness of Neural Networks to Parameter Quantization

Quantization, a commonly used technique to reduce the memory footprint o...

Mix and Match: A Novel FPGA-Centric Deep Neural Network Quantization Framework

Deep Neural Networks (DNNs) have achieved extraordinary performance in v...

Precision Gating: Improving Neural Network Efficiency with Dynamic Dual-Precision Activations

We propose precision gating (PG), an end-to-end trainable dynamic dual-p...