QDrop: Randomly Dropping Quantization for Extremely Low-bit Post-Training Quantization

03/11/2022
by   Xiuying Wei, et al.
0

Recently, post-training quantization (PTQ) has driven much attention to produce efficient neural networks without long-time retraining. Despite its low cost, current PTQ works tend to fail under the extremely low-bit setting. In this study, we pioneeringly confirm that properly incorporating activation quantization into the PTQ reconstruction benefits the final accuracy. To deeply understand the inherent reason, a theoretical framework is established, indicating that the flatness of the optimized low-bit model on calibration and test data is crucial. Based on the conclusion, a simple yet effective approach dubbed as QDROP is proposed, which randomly drops the quantization of activations during PTQ. Extensive experiments on various tasks including computer vision (image classification, object detection) and natural language processing (text classification and question answering) prove its superiority. With QDROP, the limit of PTQ is pushed to the 2-bit activation for the first time and the accuracy boost can be up to 51.49 QDROP establishes a new state of the art for PTQ. Our code is available at https://github.com/wimh966/QDrop and has been integrated into MQBench (https://github.com/ModelTC/MQBench)

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/21/2019

Quantization Networks

Although deep neural networks are highly effective, their high computati...
research
02/10/2021

BRECQ: Pushing the Limit of Post-Training Quantization by Block Reconstruction

We study the challenging task of neural network quantization without end...
research
08/25/2022

Efficient Activation Quantization via Adaptive Rounding Border for Post-Training Quantization

Post-training quantization (PTQ) attracts increasing attention due to it...
research
05/23/2021

Post-Training Sparsity-Aware Quantization

Quantization is a technique used in deep neural networks (DNNs) to incre...
research
08/25/2023

OmniQuant: Omnidirectionally Calibrated Quantization for Large Language Models

Large language models (LLMs) have revolutionized natural language proces...
research
08/12/2021

MicroNet: Improving Image Recognition with Extremely Low FLOPs

This paper aims at addressing the problem of substantial performance deg...
research
08/21/2023

Jumping through Local Minima: Quantization in the Loss Landscape of Vision Transformers

Quantization scale and bit-width are the most important parameters when ...

Please sign up or login with your details

Forgot password? Click here to reset