Post-training Quantization with Multiple Points: Mixed Precision without Mixed Precision

02/20/2020
by   Xingchao Liu, et al.
0

We consider the post-training quantization problem, which discretizes the weights of pre-trained deep neural networks without re-training the model. We propose multipoint quantization, a quantization method that approximates a full-precision weight vector using a linear combination of multiple vectors of low-bit numbers; this is in contrast to typical quantization methods that approximate each weight using a single low precision number. Computationally, we construct the multipoint quantization with an efficient greedy selection procedure, and adaptively decides the number of low precision points on each quantized weight vector based on the error of its output. This allows us to achieve higher precision levels for important weights that greatly influence the outputs, yielding an 'effect of mixed precision' but without physical mixed precision implementations (which requires specialized hardware accelerators). Empirically, our method can be implemented by common operands, bringing almost no memory and computation overhead. We show that our method outperforms a range of state-of-the-art methods on ImageNet classification and it can be generalized to more challenging tasks like PASCAL VOC object detection.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/07/2019

Bit Efficient Quantization for Deep Neural Networks

Quantization for deep neural networks have afforded models for edge devi...
research
01/26/2022

Post-training Quantization for Neural Networks with Provable Guarantees

While neural networks have been remarkably successful in a wide array of...
research
08/22/2020

One Weight Bitwidth to Rule Them All

Weight quantization for deep ConvNets has shown promising results for ap...
research
03/12/2022

A Mixed Quantization Network for Computationally Efficient Mobile Inverse Tone Mapping

Recovering a high dynamic range (HDR) image from a single low dynamic ra...
research
07/20/2020

HMQ: Hardware Friendly Mixed Precision Quantization Block for CNNs

Recent work in network quantization produced state-of-the-art results us...
research
07/31/2022

Symmetry Regularization and Saturating Nonlinearity for Robust Quantization

Robust quantization improves the tolerance of networks for various imple...
research
03/10/2022

An Empirical Study of Low Precision Quantization for TinyML

Tiny machine learning (tinyML) has emerged during the past few years aim...

Please sign up or login with your details

Forgot password? Click here to reset