FxP-QNet: A Post-Training Quantizer for the Design of Mixed Low-Precision DNNs with Dynamic Fixed-Point Representation

03/22/2022
by   Ahmad Shawahna, et al.
0

Deep neural networks (DNNs) have demonstrated their effectiveness in a wide range of computer vision tasks, with the state-of-the-art results obtained through complex and deep structures that require intensive computation and memory. Now-a-days, efficient model inference is crucial for consumer applications on resource-constrained platforms. As a result, there is much interest in the research and development of dedicated deep learning (DL) hardware to improve the throughput and energy efficiency of DNNs. Low-precision representation of DNN data-structures through quantization would bring great benefits to specialized DL hardware. However, the rigorous quantization leads to a severe accuracy drop. As such, quantization opens a large hyper-parameter space at bit-precision levels, the exploration of which is a major challenge. In this paper, we propose a novel framework referred to as the Fixed-Point Quantizer of deep neural Networks (FxP-QNet) that flexibly designs a mixed low-precision DNN for integer-arithmetic-only deployment. Specifically, the FxP-QNet gradually adapts the quantization level for each data-structure of each layer based on the trade-off between the network accuracy and the low-precision requirements. Additionally, it employs post-training self-distillation and network prediction error statistics to optimize the quantization of floating-point values into fixed-point numbers. Examining FxP-QNet on state-of-the-art architectures and the benchmark ImageNet dataset, we empirically demonstrate the effectiveness of FxP-QNet in achieving the accuracy-compression trade-off without the need for training. The results show that FxP-QNet-quantized AlexNet, VGG-16, and ResNet-18 reduce the overall memory requirements of their full-precision counterparts by 7.16x, 10.36x, and 6.44x with less than 0.95

READ FULL TEXT

page 8

page 9

page 10

page 21

page 22

page 26

page 28

page 29

research
08/06/2019

Cheetah: Mixed Low-Precision Hardware Software Co-Design Framework for DNNs on the Edge

Low-precision DNNs have been extensively explored in order to reduce the...
research
11/02/2015

BinaryConnect: Training Deep Neural Networks with binary weights during propagations

Deep Neural Networks (DNN) have achieved state-of-the-art results in a w...
research
07/29/2023

An Automata-Theoretic Approach to Synthesizing Binarized Neural Networks

Deep neural networks, (DNNs, a.k.a. NNs), have been widely used in vario...
research
02/03/2018

Mixed Precision Training of Convolutional Neural Networks using Integer Operations

The state-of-the-art (SOTA) for mixed precision training is dominated by...
research
06/26/2021

Low-Precision Training in Logarithmic Number System using Multiplicative Weight Update

Representing deep neural networks (DNNs) in low-precision is a promising...
research
08/15/2018

DNN Feature Map Compression using Learned Representation over GF(2)

In this paper, we introduce a method to compress intermediate feature ma...
research
02/27/2019

Cluster Regularized Quantization for Deep Networks Compression

Deep neural networks (DNNs) have achieved great success in a wide range ...

Please sign up or login with your details

Forgot password? Click here to reset