Jumping through Local Minima: Quantization in the Loss Landscape of Vision Transformers

08/21/2023
by   Natalia Frumkin, et al.
0

Quantization scale and bit-width are the most important parameters when considering how to quantize a neural network. Prior work focuses on optimizing quantization scales in a global manner through gradient methods (gradient descent & Hessian analysis). Yet, when applying perturbations to quantization scales, we observe a very jagged, highly non-smooth test loss landscape. In fact, small perturbations in quantization scale can greatly affect accuracy, yielding a 0.5-0.8% accuracy boost in 4-bit quantized vision transformers (ViTs). In this regime, gradient methods break down, since they cannot reliably reach local minima. In our work, dubbed Evol-Q, we use evolutionary search to effectively traverse the non-smooth landscape. Additionally, we propose using an infoNCE loss, which not only helps combat overfitting on the small calibration dataset (1,000 images) but also makes traversing such a highly non-smooth surface easier. Evol-Q improves the top-1 accuracy of a fully quantized ViT-Base by 10.30%, 0.78%, and 0.15% for 3-bit, 4-bit, and 8-bit weight quantization levels. Extensive experiments on a variety of CNN and ViT architectures further demonstrate its robustness in extreme quantization scenarios. Our code is available at https://github.com/enyac-group/evol-q

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/17/2022

CPT-V: A Contrastive Approach to Post-Training Quantization of Vision Transformers

When considering post-training quantization, prior work has typically fo...
research
11/27/2021

FQ-ViT: Fully Quantized Vision Transformer without Retraining

Network quantization significantly reduces model inference complexity an...
research
11/17/2019

Loss Aware Post-training Quantization

Neural network quantization enables the deployment of large models on re...
research
10/13/2022

SQuAT: Sharpness- and Quantization-Aware Training for BERT

Quantization is an effective technique to reduce memory footprint, infer...
research
09/11/2023

Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs

Large Language Models (LLMs) have proven their exceptional capabilities ...
research
03/11/2022

QDrop: Randomly Dropping Quantization for Extremely Low-bit Post-Training Quantization

Recently, post-training quantization (PTQ) has driven much attention to ...
research
10/09/2019

Loss Surface Sightseeing by Multi-Point Optimization

We present multi-point optimization: an optimization technique that allo...

Please sign up or login with your details

Forgot password? Click here to reset