DeepAI AI Chat
Log In Sign Up

Quantized Sparse Weight Decomposition for Neural Network Compression

07/22/2022
by   Andrey Kuzmin, et al.
8

In this paper, we introduce a novel method of neural network weight compression. In our method, we store weight tensors as sparse, quantized matrix factors, whose product is computed on the fly during inference to generate the target model's weights. We use projected gradient descent methods to find quantized and sparse factorization of the weight tensors. We show that this approach can be seen as a unification of weight SVD, vector quantization, and sparse PCA. Combined with end-to-end fine-tuning our method exceeds or is on par with previous state-of-the-art methods in terms of the trade-off between accuracy and model size. Our method is applicable to both moderate compression regimes, unlike vector quantization, and extreme compression regimes.

READ FULL TEXT

page 1

page 2

page 3

page 4

05/10/2019

Compressing Weight-updates for Image Artifacts Removal Neural Networks

In this paper, we present a novel approach for fine-tuning a decoder-sid...
05/22/2020

Position-based Scaled Gradient for Model Quantization and Sparse Training

We propose the position-based scaled gradient (PSG) that scales the grad...
08/10/2016

Approximate search with quantized sparse representations

This paper tackles the task of storing a large collection of vectors, su...
02/10/2018

On the Universal Approximability of Quantized ReLU Neural Networks

Compression is a key step to deploy large neural networks on resource-co...
04/09/2022

Channel Pruning In Quantization-aware Training: An Adaptive Projection-gradient Descent-shrinkage-splitting Method

We propose an adaptive projection-gradient descent-shrinkage-splitting m...
07/12/2021

HEMP: High-order Entropy Minimization for neural network comPression

We formulate the entropy of a quantized artificial neural network as a d...
04/15/2020

Training with Quantization Noise for Extreme Model Compression

We tackle the problem of producing compact models, maximizing their accu...