Designing strong baselines for ternary neural network quantization through support and mass equalization

06/30/2023
by   Edouard Yvinec, et al.
0

Deep neural networks (DNNs) offer the highest performance in a wide range of applications in computer vision. These results rely on over-parameterized backbones, which are expensive to run. This computational burden can be dramatically reduced by quantizing (in either data-free (DFQ), post-training (PTQ) or quantization-aware training (QAT) scenarios) floating point values to ternary values (2 bits, with each weight taking value in -1,0,1). In this context, we observe that rounding to nearest minimizes the expected error given a uniform distribution and thus does not account for the skewness and kurtosis of the weight distribution, which strongly affects ternary quantization performance. This raises the following question: shall one minimize the highest or average quantization error? To answer this, we design two operators: TQuant and MQuant that correspond to these respective minimization tasks. We show experimentally that our approach allows to significantly improve the performance of ternary quantization through a variety of scenarios in DFQ, PTQ and QAT and give strong insights to pave the way for future research in deep neural network quantization.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/31/2020

Near-Lossless Post-Training Quantization of Deep Neural Networks via a Piecewise Linear Approximation

Quantization plays an important role for energy-efficient deployment of ...
research
04/25/2021

Quantization of Deep Neural Networks for Accurate EdgeComputing

Deep neural networks (DNNs) have demonstrated their great potential in r...
research
03/23/2023

Scaled Quantization for the Vision Transformer

Quantization using a small number of bits shows promise for reducing lat...
research
11/20/2015

Resiliency of Deep Neural Networks under Quantization

The complexity of deep neural network algorithms for hardware implementa...
research
03/25/2021

A Survey of Quantization Methods for Efficient Neural Network Inference

As soon as abstract mathematical computations were adapted to computatio...
research
03/28/2022

REx: Data-Free Residual Quantization Error Expansion

Deep neural networks (DNNs) are nowadays ubiquitous in the computer visi...
research
03/23/2023

Benchmarking the Reliability of Post-training Quantization: a Particular Focus on Worst-case Performance

Post-training quantization (PTQ) is a popular method for compressing dee...

Please sign up or login with your details

Forgot password? Click here to reset