REx: Data-Free Residual Quantization Error Expansion

03/28/2022
by   Edouard Yvinec, et al.
0

Deep neural networks (DNNs) are nowadays ubiquitous in the computer vision landscape. However, they suffer from high computational costs in inference, particularly when evaluated on edge devices. This problem is generally addressed via post-hoc quantization, i.e. converting the DNN values (weights and inputs) from floating point into e.g. int8, int4 or ternary quantization. In this paper, we propose REx, a data-free quantization algorithm for pre-trained models that is compliant with data protection regulations, convenient and fast to execute. First, we improve upon the naive linear quantization operator by decomposing the weights as an expansion of residual quantization errors. Second, we propose a budgeted group-sparsity formulation to achieve better accuracy vs. number of bit-wise operation trade-offs with sparse, higher expansion orders. Third, we show that this sparse expansion can be approximated by an ensemble of quantized neural networks to dramatically improve the evaluation speed through more efficient parallelization. We provide theoretical guarantees of the efficiency of REx as well as a thorough empirical validation on several popular DNN architectures applied to multiple computer vision problems, e.g. ImageNet classification, object detection as well as semantic segmentation. In particular, we show that REx significantly outperforms existing state-of-the-art data-free quantization techniques.

READ FULL TEXT

page 2

page 23

research
03/28/2022

SPIQ: Data-Free Per-Channel Static Input Quantization

Computationally expensive neural networks are ubiquitous in computer vis...
research
10/14/2022

Post-Training Quantization for Energy Efficient Realization of Deep Neural Networks

The biggest challenge for the deployment of Deep Neural Networks (DNNs) ...
research
01/24/2023

PowerQuant: Automorphism Search for Non-Uniform Quantization

Deep neural networks (DNNs) are nowadays ubiquitous in many domains such...
research
06/30/2023

Designing strong baselines for ternary neural network quantization through support and mass equalization

Deep neural networks (DNNs) offer the highest performance in a wide rang...
research
04/22/2020

Up or Down? Adaptive Rounding for Post-Training Quantization

When quantizing neural networks, assigning each floating-point weight to...
research
08/31/2020

An Integrated Approach to Produce Robust Models with High Efficiency

Deep Neural Networks (DNNs) needs to be both efficient and robust for pr...
research
01/28/2019

Improving Neural Network Quantization using Outlier Channel Splitting

Quantization can improve the execution latency and energy efficiency of ...

Please sign up or login with your details

Forgot password? Click here to reset