Deep learning model compression using network sensitivity and gradients

10/11/2022
by   Madhumitha Sakthi, et al.
0

Deep learning model compression is an improving and important field for the edge deployment of deep learning models. Given the increasing size of the models and their corresponding power consumption, it is vital to decrease the model size and compute requirement without a significant drop in the model's performance. In this paper, we present model compression algorithms for both non-retraining and retraining conditions. In the first case where retraining of the model is not feasible due to lack of access to the original data or absence of necessary compute resources while only having access to off-the-shelf models, we propose the Bin Quant algorithm for compression of the deep learning models using the sensitivity of the network parameters. This results in 13x compression of the speech command and control model and 7x compression of the DeepSpeech2 models. In the second case when the models can be retrained and utmost compression is required for the negligible loss in accuracy, we propose our novel gradient-weighted k-means clustering algorithm (GWK). This method uses the gradients in identifying the important weight values in a given cluster and nudges the centroid towards those values, thereby giving importance to sensitive weights. Our method effectively combines product quantization with the EWGS[1] algorithm for sub-1-bit representation of the quantized models. We test our GWK algorithm on the CIFAR10 dataset across a range of models such as ResNet20, ResNet56, MobileNetv2 and show 35x compression on quantized models for less than 2 models.

READ FULL TEXT
research
07/13/2017

Model compression as constrained optimization, with application to neural nets. Part II: quantization

We consider the problem of deep neural net compression by quantization: ...
research
04/24/2020

Automatic low-bit hybrid quantization of neural networks through meta learning

Model quantization is a widely used technique to compress and accelerate...
research
04/15/2020

Training with Quantization Noise for Extreme Model Compression

We tackle the problem of producing compact models, maximizing their accu...
research
03/29/2021

Deep Compression for PyTorch Model Deployment on Microcontrollers

Neural network deployment on low-cost embedded systems, hence on microco...
research
06/03/2013

Predicting Parameters in Deep Learning

We demonstrate that there is significant redundancy in the parameterizat...
research
09/25/2019

Balancing Specialization, Generalization, and Compression for Detection and Tracking

We propose a method for specializing deep detectors and trackers to rest...
research
11/20/2020

Empirical Evaluation of Deep Learning Model Compression Techniques on the WaveNet Vocoder

WaveNet is a state-of-the-art text-to-speech vocoder that remains challe...

Please sign up or login with your details

Forgot password? Click here to reset