VecQ: Minimal Loss DNN Model Compression With Vectorized Weight Quantization

05/18/2020
by   Cheng Gong, et al.
5

Quantization has been proven to be an effective method for reducing the computing and/or storage cost of DNNs. However, the trade-off between the quantization bitwidth and final accuracy is complex and non-convex, which makes it difficult to be optimized directly. Minimizing direct quantization loss (DQL) of the coefficient data is an effective local optimization method, but previous works often neglect the accurate control of the DQL, resulting in a higher loss of the final DNN model accuracy. In this paper, we propose a novel metric called Vector Loss. Based on this new metric, we develop a new quantization solution called VecQ, which can guarantee minimal direct quantization loss and better model accuracy. In addition, in order to speed up the proposed quantization process during model training, we accelerate the quantization process with a parameterized probability estimation method and template-based derivation calculation. We evaluate our proposed algorithm on MNIST, CIFAR, ImageNet, IMDB movie review and THUCNews text data sets with numerical DNN models. The results demonstrate that our proposed quantization solution is more accurate and effective than the state-of-the-art approaches yet with more flexible bitwidth support. Moreover, the evaluation of our quantized models on Saliency Object Detection (SOD) tasks maintains comparable feature extraction quality with up to 16× weight size reduction.

READ FULL TEXT

page 8

page 12

page 14

research
05/02/2019

Toward Extremely Low Bit and Lossless Accuracy in DNNs with Progressive ADMM

Weight quantization is one of the most important techniques of Deep Neur...
research
10/30/2016

Accurate Deep Representation Quantization with Gradient Snapping Layer for Similarity Search

Recent advance of large scale similarity search involves using deeply le...
research
10/31/2022

Model Compression for DNN-Based Text-Independent Speaker Verification Using Weight Quantization

DNN-based models achieve high performance in the speaker verification (S...
research
02/02/2020

SQWA: Stochastic Quantized Weight Averaging for Improving the Generalization Capability of Low-Precision Deep Neural Networks

Designing a deep neural network (DNN) with good generalization capabilit...
research
02/09/2021

Distribution Adaptive INT8 Quantization for Training CNNs

Researches have demonstrated that low bit-width (e.g., INT8) quantizatio...
research
05/30/2019

Quantization Loss Re-Learning Method

In order to quantize the gate parameters of the LSTM (Long Short-Term Me...
research
07/13/2020

Term Revealing: Furthering Quantization at Run Time on Quantized DNNs

We present a novel technique, called Term Revealing (TR), for furthering...

Please sign up or login with your details

Forgot password? Click here to reset