DeepAI AI Chat
Log In Sign Up

Quantized Sparse Weight Decomposition for Neural Network Compression

by   Andrey Kuzmin, et al.

In this paper, we introduce a novel method of neural network weight compression. In our method, we store weight tensors as sparse, quantized matrix factors, whose product is computed on the fly during inference to generate the target model's weights. We use projected gradient descent methods to find quantized and sparse factorization of the weight tensors. We show that this approach can be seen as a unification of weight SVD, vector quantization, and sparse PCA. Combined with end-to-end fine-tuning our method exceeds or is on par with previous state-of-the-art methods in terms of the trade-off between accuracy and model size. Our method is applicable to both moderate compression regimes, unlike vector quantization, and extreme compression regimes.


page 1

page 2

page 3

page 4


Compressing Weight-updates for Image Artifacts Removal Neural Networks

In this paper, we present a novel approach for fine-tuning a decoder-sid...

Position-based Scaled Gradient for Model Quantization and Sparse Training

We propose the position-based scaled gradient (PSG) that scales the grad...

Approximate search with quantized sparse representations

This paper tackles the task of storing a large collection of vectors, su...

On the Universal Approximability of Quantized ReLU Neural Networks

Compression is a key step to deploy large neural networks on resource-co...

Channel Pruning In Quantization-aware Training: An Adaptive Projection-gradient Descent-shrinkage-splitting Method

We propose an adaptive projection-gradient descent-shrinkage-splitting m...

HEMP: High-order Entropy Minimization for neural network comPression

We formulate the entropy of a quantized artificial neural network as a d...

Training with Quantization Noise for Extreme Model Compression

We tackle the problem of producing compact models, maximizing their accu...