Rotation Invariant Quantization for Model Compression

03/03/2023
by   Joseph Kampeas, et al.
0

Post-training Neural Network (NN) model compression is an attractive approach for deploying large, memory-consuming models on devices with limited memory resources. In this study, we investigate the rate-distortion tradeoff for NN model compression. First, we suggest a Rotation-Invariant Quantization (RIQ) technique that utilizes a single parameter to quantize the entire NN model, yielding a different rate at each layer, i.e., mixed-precision quantization. Then, we prove that our rotation-invariant approach is optimal in terms of compression. We rigorously evaluate RIQ and demonstrate its capabilities on various models and tasks. For example, RIQ facilitates × 19.4 and × 52.9 compression ratios on pre-trained VGG dense and pruned models, respectively, with <0.4% accuracy degradation. Code: <https://github.com/ehaleva/RIQ>.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/20/2018

Differentiable Fine-grained Quantization for Deep Neural Network Compression

Neural networks have shown great performance in cognitive tasks. When de...
research
02/16/2021

Successive Pruning for Model Compression via Rate Distortion Theory

Neural network (NN) compression has become essential to enable deploying...
research
06/09/2020

Neural Network Activation Quantization with Bitwise Information Bottlenecks

Recent researches on information bottleneck shed new light on the contin...
research
11/17/2019

Loss Aware Post-training Quantization

Neural network quantization enables the deployment of large models on re...
research
04/26/2022

Estimating the Resize Parameter in End-to-end Learned Image Compression

We describe a search-free resizing framework that can further improve th...
research
10/31/2022

GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers

Generative Pre-trained Transformer (GPT) models set themselves apart thr...
research
06/07/2023

MobileNMT: Enabling Translation in 15MB and 30ms

Deploying NMT models on mobile devices is essential for privacy, low lat...

Please sign up or login with your details

Forgot password? Click here to reset