Quantization Aware Factorization for Deep Neural Network Compression

08/08/2023
by   Daria Cherniuk, et al.
0

Tensor decomposition of convolutional and fully-connected layers is an effective way to reduce parameters and FLOP in neural networks. Due to memory and power consumption limitations of mobile or embedded devices, the quantization step is usually necessary when pre-trained models are deployed. A conventional post-training quantization approach applied to networks with decomposed weights yields a drop in accuracy. This motivated us to develop an algorithm that finds tensor approximation directly with quantized factors and thus benefit from both compression techniques while keeping the prediction quality of the model. Namely, we propose to use Alternating Direction Method of Multipliers (ADMM) for Canonical Polyadic (CP) decomposition with factors whose elements lie on a specified quantization grid. We compress neural network weights with a devised algorithm and evaluate it's prediction quality and performance. We compare our approach to state-of-the-art post-training quantization methods and demonstrate competitive results and high flexibility in achiving a desirable quality-performance tradeoff.

READ FULL TEXT

page 6

page 7

research
11/29/2021

Low-bit Quantization of Recurrent Neural Network Language Models Using Alternating Direction Methods of Multipliers

The high memory consumption and computational costs of Recurrent neural ...
research
07/30/2022

Distilled Low Rank Neural Radiance Field with Quantization for Light Field Compression

In this paper, we propose a novel light field compression method based o...
research
09/10/2019

Accelerating Training using Tensor Decomposition

Tensor decomposition is one of the well-known approaches to reduce the l...
research
06/16/2020

CNN Acceleration by Low-rank Approximation with Quantized Factors

The modern convolutional neural networks although achieve great results ...
research
01/26/2022

Post-training Quantization for Neural Networks with Provable Guarantees

While neural networks have been remarkably successful in a wide array of...
research
10/15/2021

PTQ-SL: Exploring the Sub-layerwise Post-training Quantization

Network quantization is a powerful technique to compress convolutional n...
research
05/26/2019

HadaNets: Flexible Quantization Strategies for Neural Networks

On-board processing elements on UAVs are currently inadequate for traini...

Please sign up or login with your details

Forgot password? Click here to reset