Post-training Quantization for Neural Networks with Provable Guarantees

01/26/2022
by   Jinjie Zhang, et al.
0

While neural networks have been remarkably successful in a wide array of applications, implementing them in resource-constrained hardware remains an area of intense research. By replacing the weights of a neural network with quantized (e.g., 4-bit, or binary) counterparts, massive savings in computation cost, memory, and power consumption are attained. We modify a post-training neural-network quantization method, GPFQ, that is based on a greedy path-following mechanism, and rigorously analyze its error. We prove that for quantizing a single-layer network, the relative square error essentially decays linearly in the number of weights – i.e., level of over-parametrization. Our result holds across a range of input distributions and for both fully-connected and convolutional architectures. To empirically evaluate the method, we quantize several common architectures with few bits per weight, and test them on ImageNet, showing only minor loss of accuracy. We also demonstrate that standard modifications, such as bias correction and mixed precision quantization, further improve accuracy.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/20/2020

Post-training Quantization with Multiple Points: Mixed Precision without Mixed Precision

We consider the post-training quantization problem, which discretizes th...
research
09/20/2023

SPFQ: A Stochastic Algorithm and Its Error Analysis for Neural Network Quantization

Quantization is a widely used compression method that effectively reduce...
research
02/19/2022

Bit-wise Training of Neural Network Weights

We introduce an algorithm where the individual bits representing the wei...
research
04/26/2022

RAPQ: Rescuing Accuracy for Power-of-Two Low-bit Post-training Quantization

We introduce a Power-of-Two post-training quantization( PTQ) method for ...
research
12/06/2022

QEBVerif: Quantization Error Bound Verification of Neural Networks

While deep neural networks (DNNs) have demonstrated impressive performan...
research
08/08/2023

Quantization Aware Factorization for Deep Neural Network Compression

Tensor decomposition of convolutional and fully-connected layers is an e...
research
06/23/2023

QNNRepair: Quantized Neural Network Repair

We present QNNRepair, the first method in the literature for repairing q...

Please sign up or login with your details

Forgot password? Click here to reset