Analysis of Quantization on MLP-based Vision Models

09/14/2022
by   Lingran Zhao, et al.
0

Quantization is wildly taken as a model compression technique, which obtains efficient models by converting floating-point weights and activations in the neural network into lower-bit integers. Quantization has been proven to work well on convolutional neural networks and transformer-based models. Despite the decency of these models, recent works have shown that MLP-based models are able to achieve comparable results on various tasks ranging from computer vision, NLP to 3D point cloud, while achieving higher throughput due to the parallelism and network simplicity. However, as we show in the paper, directly applying quantization to MLP-based models will lead to significant accuracy degradation. Based on our analysis, two major issues account for the accuracy gap: 1) the range of activations in MLP-based models can be too large to quantize, and 2) specific components in the MLP-based models are sensitive to quantization. Consequently, we propose to 1) apply LayerNorm to control the quantization range of activations, 2) utilize bounded activation functions, 3) apply percentile quantization on activations, 4) use our improved module named multiple token-mixing MLPs, and 5) apply linear asymmetric quantizer for sensitive operations. Equipped with the abovementioned techniques, our Q-MLP models can achieve 79.68 (model size 30 MB) and 78.47

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/16/2018

PACT: Parameterized Clipping Activation for Quantized Neural Networks

Deep learning algorithms achieve high classification accuracy at the exp...
research
06/30/2020

EasyQuant: Post-training Quantization via Scale Optimization

The 8 bits quantization has been widely applied to accelerate network in...
research
06/15/2021

A White Paper on Neural Network Quantization

While neural networks have advanced the frontiers in many applications, ...
research
03/28/2022

SPIQ: Data-Free Per-Channel Static Input Quantization

Computationally expensive neural networks are ubiquitous in computer vis...
research
06/04/2022

ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers

How to efficiently serve ever-larger trained natural language models in ...
research
03/26/2019

Robustness of Neural Networks to Parameter Quantization

Quantization, a commonly used technique to reduce the memory footprint o...
research
04/20/2020

LSQ+: Improving low-bit quantization through learnable offsets and better initialization

Unlike ReLU, newer activation functions (like Swish, H-swish, Mish) that...

Please sign up or login with your details

Forgot password? Click here to reset