SPFQ: A Stochastic Algorithm and Its Error Analysis for Neural Network Quantization

09/20/2023
by   Jinjie Zhang, et al.
0

Quantization is a widely used compression method that effectively reduces redundancies in over-parameterized neural networks. However, existing quantization techniques for deep neural networks often lack a comprehensive error analysis due to the presence of non-convex loss functions and nonlinear activations. In this paper, we propose a fast stochastic algorithm for quantizing the weights of fully trained neural networks. Our approach leverages a greedy path-following mechanism in combination with a stochastic quantizer. Its computational complexity scales only linearly with the number of weights in the network, thereby enabling the efficient quantization of large networks. Importantly, we establish, for the first time, full-network error bounds, under an infinite alphabet condition and minimal assumptions on the weights and input data. As an application of this result, we prove that when quantizing a multi-layer network having Gaussian weights, the relative square quantization error exhibits a linear decay as the degree of over-parametrization increases. Furthermore, we demonstrate that it is possible to achieve error bounds equivalent to those obtained in the infinite alphabet case, using on the order of a mere loglog N bits per weight, where N represents the largest number of neurons in a layer.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/29/2020

A Greedy Algorithm for Quantizing Neural Networks

We propose a new computationally efficient method for quantizing the wei...
research
01/26/2022

Post-training Quantization for Neural Networks with Provable Guarantees

While neural networks have been remarkably successful in a wide array of...
research
07/16/2019

An Inter-Layer Weight Prediction and Quantization for Deep Neural Networks based on a Smoothly Varying Weight Hypothesis

Network compression for deep neural networks has become an important par...
research
02/05/2019

Same, Same But Different - Recovering Neural Network Quantization Error Through Weight Factorization

Quantization of neural networks has become common practice, driven by th...
research
10/15/2021

Training Deep Neural Networks with Joint Quantization and Pruning of Weights and Activations

Quantization and pruning are core techniques used to reduce the inferenc...
research
09/01/2022

On Quantizing Implicit Neural Representations

The role of quantization within implicit/coordinate neural networks is s...
research
09/07/2022

A simple approach for quantizing neural networks

In this short note, we propose a new method for quantizing the weights o...

Please sign up or login with your details

Forgot password? Click here to reset