SPFQ: A Stochastic Algorithm and Its Error Analysis for Neural Network Quantization

by   Jinjie Zhang, et al.

Quantization is a widely used compression method that effectively reduces redundancies in over-parameterized neural networks. However, existing quantization techniques for deep neural networks often lack a comprehensive error analysis due to the presence of non-convex loss functions and nonlinear activations. In this paper, we propose a fast stochastic algorithm for quantizing the weights of fully trained neural networks. Our approach leverages a greedy path-following mechanism in combination with a stochastic quantizer. Its computational complexity scales only linearly with the number of weights in the network, thereby enabling the efficient quantization of large networks. Importantly, we establish, for the first time, full-network error bounds, under an infinite alphabet condition and minimal assumptions on the weights and input data. As an application of this result, we prove that when quantizing a multi-layer network having Gaussian weights, the relative square quantization error exhibits a linear decay as the degree of over-parametrization increases. Furthermore, we demonstrate that it is possible to achieve error bounds equivalent to those obtained in the infinite alphabet case, using on the order of a mere loglog N bits per weight, where N represents the largest number of neurons in a layer.


page 1

page 2

page 3

page 4


A Greedy Algorithm for Quantizing Neural Networks

We propose a new computationally efficient method for quantizing the wei...

Post-training Quantization for Neural Networks with Provable Guarantees

While neural networks have been remarkably successful in a wide array of...

An Inter-Layer Weight Prediction and Quantization for Deep Neural Networks based on a Smoothly Varying Weight Hypothesis

Network compression for deep neural networks has become an important par...

Same, Same But Different - Recovering Neural Network Quantization Error Through Weight Factorization

Quantization of neural networks has become common practice, driven by th...

Training Deep Neural Networks with Joint Quantization and Pruning of Weights and Activations

Quantization and pruning are core techniques used to reduce the inferenc...

On Quantizing Implicit Neural Representations

The role of quantization within implicit/coordinate neural networks is s...

A simple approach for quantizing neural networks

In this short note, we propose a new method for quantizing the weights o...

Please sign up or login with your details

Forgot password? Click here to reset