Quantized Sparse Weight Decomposition for Neural Network Compression

07/22/2022
by   Andrey Kuzmin, et al.
8

In this paper, we introduce a novel method of neural network weight compression. In our method, we store weight tensors as sparse, quantized matrix factors, whose product is computed on the fly during inference to generate the target model's weights. We use projected gradient descent methods to find quantized and sparse factorization of the weight tensors. We show that this approach can be seen as a unification of weight SVD, vector quantization, and sparse PCA. Combined with end-to-end fine-tuning our method exceeds or is on par with previous state-of-the-art methods in terms of the trade-off between accuracy and model size. Our method is applicable to both moderate compression regimes, unlike vector quantization, and extreme compression regimes.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/10/2019

Compressing Weight-updates for Image Artifacts Removal Neural Networks

In this paper, we present a novel approach for fine-tuning a decoder-sid...
research
05/22/2020

Position-based Scaled Gradient for Model Quantization and Sparse Training

We propose the position-based scaled gradient (PSG) that scales the grad...
research
08/10/2016

Approximate search with quantized sparse representations

This paper tackles the task of storing a large collection of vectors, su...
research
02/10/2018

On the Universal Approximability of Quantized ReLU Neural Networks

Compression is a key step to deploy large neural networks on resource-co...
research
04/09/2022

Channel Pruning In Quantization-aware Training: An Adaptive Projection-gradient Descent-shrinkage-splitting Method

We propose an adaptive projection-gradient descent-shrinkage-splitting m...
research
07/12/2021

HEMP: High-order Entropy Minimization for neural network comPression

We formulate the entropy of a quantized artificial neural network as a d...
research
06/22/2020

Exploiting Weight Redundancy in CNNs: Beyond Pruning and Quantization

Pruning and quantization are proven methods for improving the performanc...

Please sign up or login with your details

Forgot password? Click here to reset