Retraining-Based Iterative Weight Quantization for Deep Neural Networks

05/29/2018
by   Dongsoo Lee, et al.
0

Model compression has gained a lot of attention due to its ability to reduce hardware resource requirements significantly while maintaining accuracy of DNNs. Model compression is especially useful for memory-intensive recurrent neural networks because smaller memory footprint is crucial not only for reducing storage requirement but also for fast inference operations. Quantization is known to be an effective model compression method and researchers are interested in minimizing the number of bits to represent parameters. In this work, we introduce an iterative technique to apply quantization, presenting high compression ratio without any modifications to the training algorithm. In the proposed technique, weight quantization is followed by retraining the model with full precision weights. We show that iterative retraining generates new sets of weights which can be quantized with decreasing quantization loss at each iteration. We also show that quantization is efficiently able to leverage pruning, another effective model compression method. Implementation issues on combining the two methods are also addressed. Our experimental results demonstrate that an LSTM model using 1-bit quantized weights is sufficient for PTB dataset without any accuracy degradation while previous methods demand at least 2-4 bits for quantized weights.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/15/2023

Neural Network Compression using Binarization and Few Full-Precision Weights

Quantization and pruning are known to be two effective Deep Neural Netwo...
research
05/24/2019

Structured Compression by Unstructured Pruning for Sparse Quantized Neural Networks

Model compression techniques, such as pruning and quantization, are beco...
research
06/05/2023

SpQR: A Sparse-Quantized Representation for Near-Lossless LLM Weight Compression

Recent advances in large language model (LLM) pretraining have led to hi...
research
07/13/2017

Model compression as constrained optimization, with application to neural nets. Part II: quantization

We consider the problem of deep neural net compression by quantization: ...
research
09/09/2020

FleXOR: Trainable Fractional Quantization

Quantization based on the binary codes is gaining attention because each...
research
03/13/2023

Bag of Tricks with Quantized Convolutional Neural Networks for image classification

Deep neural networks have been proven effective in a wide range of tasks...
research
07/16/2019

An Inter-Layer Weight Prediction and Quantization for Deep Neural Networks based on a Smoothly Varying Weight Hypothesis

Network compression for deep neural networks has become an important par...

Please sign up or login with your details

Forgot password? Click here to reset