MetaGrad: Adaptive Gradient Quantization with Hypernetworks

03/04/2023
by   Kaixin Xu, et al.
0

A popular track of network compression approach is Quantization aware Training (QAT), which accelerates the forward pass during the neural network training and inference. However, not much prior efforts have been made to quantize and accelerate the backward pass during training, even though that contributes around half of the training time. This can be partly attributed to the fact that errors of low-precision gradients during backward cannot be amortized by the training objective as in the QAT setting. In this work, we propose to solve this problem by incorporating the gradients into the computation graph of the next training iteration via a hypernetwork. Various experiments on CIFAR-10 dataset with different CNN network architectures demonstrate that our hypernetwork-based approach can effectively reduce the negative effect of gradient quantization noise and successfully quantizes the gradients to INT4 with only 0.64 accuracy drop for VGG-16 on CIFAR-10.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/09/2021

Distribution Adaptive INT8 Quantization for Training CNNs

Researches have demonstrated that low bit-width (e.g., INT8) quantizatio...
research
12/11/2022

Error-aware Quantization through Noise Tempering

Quantization has become a predominant approach for model compression, en...
research
11/24/2021

Softmax Gradient Tampering: Decoupling the Backward Pass for Improved Fitting

We introduce Softmax Gradient Tampering, a technique for modifying the g...
research
12/29/2019

Towards Unified INT8 Training for Convolutional Neural Network

Recently low-bit (e.g., 8-bit) network quantization has been extensively...
research
03/08/2021

Reliability-Aware Quantization for Anti-Aging NPUs

Transistor aging is one of the major concerns that challenges designers ...
research
11/08/2018

GradiVeQ: Vector Quantization for Bandwidth-Efficient Gradient Aggregation in Distributed CNN Training

Data parallelism can boost the training speed of convolutional neural ne...
research
02/01/2022

Few-Bit Backward: Quantized Gradients of Activation Functions for Memory Footprint Reduction

Memory footprint is one of the main limiting factors for large neural ne...

Please sign up or login with your details

Forgot password? Click here to reset