Activation Compression of Graph Neural Networks using Block-wise Quantization with Improved Variance Minimization

09/21/2023
by   Sebastian Eliassen, et al.
0

Efficient training of large-scale graph neural networks (GNNs) has been studied with a specific focus on reducing their memory consumption. Work by Liu et al. (2022) proposed extreme activation compression (EXACT) which demonstrated drastic reduction in memory consumption by performing quantization of the intermediate activation maps down to using INT2 precision. They showed little to no reduction in performance while achieving large reductions in GPU memory consumption. In this work, we present an improvement to the EXACT strategy by using block-wise quantization of the intermediate activation maps. We experimentally analyze different block sizes and show further reduction in memory consumption (>15 performing extreme extents of quantization with similar performance trade-offs as with the original EXACT. Further, we present a correction to the assumptions on the distribution of intermediate activation maps in EXACT (assumed to be uniform) and show improved variance estimations of the quantization and dequantization steps.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/08/2022

TinyKG: Memory-Efficient Training Framework for Knowledge Graph Neural Recommender Systems

There has been an explosion of interest in designing various Knowledge G...
research
07/09/2020

SGQuant: Squeezing the Last Bit on Graph Neural Networks with Specialized Quantization

With the increasing popularity of graph-based learning, Graph Neural Net...
research
04/29/2021

ActNN: Reducing Training Memory Footprint via 2-Bit Activation Compressed Training

The increasing size of neural network models has been critical for impro...
research
08/11/2020

Degree-Quant: Quantization-Aware Training for Graph Neural Networks

Graph neural networks (GNNs) have demonstrated strong performance on a w...
research
02/01/2023

A^2Q: Aggregation-Aware Quantization for Graph Neural Networks

As graph data size increases, the vast latency and memory consumption du...
research
06/12/2023

NF4 Isn't Information Theoretically Optimal (and that's Good)

This note shares some simple calculations and experiments related to abs...
research
09/30/2021

Towards Efficient Post-training Quantization of Pre-trained Language Models

Network quantization has gained increasing attention with the rapid grow...

Please sign up or login with your details

Forgot password? Click here to reset