OPQ: Compressing Deep Neural Networks with One-shot Pruning-Quantization

05/23/2022
by   Peng Hu, et al.
0

As Deep Neural Networks (DNNs) usually are overparameterized and have millions of weight parameters, it is challenging to deploy these large DNN models on resource-constrained hardware platforms, e.g., smartphones. Numerous network compression methods such as pruning and quantization are proposed to reduce the model size significantly, of which the key is to find suitable compression allocation (e.g., pruning sparsity and quantization codebook) of each layer. Existing solutions obtain the compression allocation in an iterative/manual fashion while finetuning the compressed model, thus suffering from the efficiency issue. Different from the prior art, we propose a novel One-shot Pruning-Quantization (OPQ) in this paper, which analytically solves the compression allocation with pre-trained weight parameters only. During finetuning, the compression module is fixed and only weight parameters are updated. To our knowledge, OPQ is the first work that reveals pre-trained model is sufficient for solving pruning and quantization simultaneously, without any complex iterative/manual optimization at the finetuning stage. Furthermore, we propose a unified channel-wise quantization method that enforces all channels of each layer to share a common codebook, which leads to low bit-rate allocation without introducing extra overhead brought by traditional channel-wise quantization. Comprehensive experiments on ImageNet with AlexNet/MobileNet-V1/ResNet-50 show that our method improves accuracy and training efficiency while obtains significantly higher compression rates compared to the state-of-the-art.

READ FULL TEXT
research
11/12/2020

Automated Model Compression by Jointly Applied Pruning and Quantization

In the traditional deep compression framework, iteratively performing ne...
research
10/14/2019

Learning Sparsity and Quantization Jointly and Automatically for Neural Network Compression via Constrained Optimization

Deep Neural Networks (DNNs) are widely applied in a wide range of usecas...
research
08/24/2022

Optimal Brain Compression: A Framework for Accurate Post-Training Quantization and Pruning

We consider the problem of model compression for deep neural networks (D...
research
01/11/2020

Embedding Compression with Isotropic Iterative Quantization

Continuous representation of words is a standard component in deep learn...
research
08/28/2021

Compact representations of convolutional neural networks via weight pruning and quantization

The state-of-the-art performance for several real-world problems is curr...
research
01/09/2019

How Compact?: Assessing Compactness of Representations through Layer-Wise Pruning

Various forms of representations may arise in the many layers embedded i...
research
06/12/2019

Parameterized Structured Pruning for Deep Neural Networks

As a result of the growing size of Deep Neural Networks (DNNs), the gap ...

Please sign up or login with your details

Forgot password? Click here to reset