Towards Optimal Compression: Joint Pruning and Quantization

02/15/2023
by   Ben Zandonati, et al.
0

Compression of deep neural networks has become a necessary stage for optimizing model inference on resource-constrained hardware. This paper presents FITCompress, a method for unifying layer-wise mixed precision quantization and pruning under a single heuristic, as an alternative to neural architecture search and Bayesian-based techniques. FITCompress combines the Fisher Information Metric, and path planning through compression space, to pick optimal configurations given size and operation constraints with single-shot fine-tuning. Experiments on ImageNet validate the method and show that our approach yields a better trade-off between accuracy and efficiency when compared to the baselines. Besides computer vision benchmarks, we experiment with the BERT model on a language understanding task, paving the way towards its optimal compression.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/20/2020

Differentiable Joint Pruning and Quantization for Hardware Efficiency

We present a differentiable joint pruning and quantization (DJPQ) scheme...
research
09/10/2019

Differentiable Mask Pruning for Neural Networks

Pruning of neural networks is one of the well-known and promising model ...
research
12/30/2021

Automatic Mixed-Precision Quantization Search of BERT

Pre-trained language models such as BERT have shown remarkable effective...
research
11/12/2020

Automated Model Compression by Jointly Applied Pruning and Quantization

In the traditional deep compression framework, iteratively performing ne...
research
06/09/2023

End-to-End Neural Network Compression via ℓ_1/ℓ_2 Regularized Latency Surrogates

Neural network (NN) compression via techniques such as pruning, quantiza...
research
12/04/2019

Deep Model Compression via Deep Reinforcement Learning

Besides accuracy, the storage of convolutional neural networks (CNN) mod...
research
08/19/2021

An Information Theory-inspired Strategy for Automatic Network Pruning

Despite superior performance on many computer vision tasks, deep convolu...

Please sign up or login with your details

Forgot password? Click here to reset