Same, Same But Different - Recovering Neural Network Quantization Error Through Weight Factorization

02/05/2019
by   Eldad Meller, et al.
0

Quantization of neural networks has become common practice, driven by the need for efficient implementations of deep neural networks on embedded devices. In this paper, we exploit an oft-overlooked degree of freedom in most networks - for a given layer, individual output channels can be scaled by any factor provided that the corresponding weights of the next layer are inversely scaled. Therefore, a given network has many factorizations which change the weights of the network without changing its function. We present a conceptually simple and easy to implement method that uses this property and show that proper factorizations significantly decrease the degradation caused by quantization. We show improvement on a wide variety of networks and achieve state-of-the-art degradation results for MobileNets. While our focus is on quantization, this type of factorization is applicable to other domains such as network-pruning, neural nets regularization and network interpretability.

READ FULL TEXT
research
07/01/2019

Weight Normalization based Quantization for Deep Neural Network Compression

With the development of deep neural networks, the size of network models...
research
06/14/2018

Scalable Neural Network Compression and Pruning Using Hard Clustering and L1 Regularization

We propose a simple and easy to implement neural network compression alg...
research
12/15/2020

Exploring Neural Networks Quantization via Layer-Wise Quantization Analysis

Quantization is an essential step in the efficient deployment of deep le...
research
04/25/2021

Quantization of Deep Neural Networks for Accurate EdgeComputing

Deep neural networks (DNNs) have demonstrated their great potential in r...
research
09/20/2023

SPFQ: A Stochastic Algorithm and Its Error Analysis for Neural Network Quantization

Quantization is a widely used compression method that effectively reduce...
research
07/30/2021

Pruning Neural Networks with Interpolative Decompositions

We introduce a principled approach to neural network pruning that casts ...
research
11/30/2020

FactorizeNet: Progressive Depth Factorization for Efficient Network Architecture Exploration Under Quantization Constraints

Depth factorization and quantization have emerged as two of the principa...

Please sign up or login with your details

Forgot password? Click here to reset