QGTC: Accelerating Quantized Graph Neural Networks via GPU Tensor Core

11/18/2021
by   Yuke Wang, et al.
0

Over the most recent years, quantized graph neural network (QGNN) attracts lots of research and industry attention due to its high robustness and low computation and memory overhead. Unfortunately, the performance gains of QGNN have never been realized on modern GPU platforms. To this end, we propose the first Tensor Core (TC) based computing framework, QGTC, to support any-bitwidth computation for QGNNs on GPUs. We introduce a novel quantized low-bit arithmetic design based on the low-bit data representation and bit-decomposed computation. We craft a novel TC-tailored CUDA kernel design by incorporating 3D-stacked bit compression, zero-tile jumping, and non-zero tile reuse technique to improve the performance systematically. We incorporate an effective bandwidth-optimized subgraph packing strategy to maximize the transferring efficiency between CPU host and GPU device. We integrate QGTC with Pytorch for better programmability and extensibility. Extensive experiments demonstrate that QGTC achieves on average 2.7x speedup compared with the state-of-the-art Deep Graph Library framework across diverse settings.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/03/2021

TC-GNN: Accelerating Sparse Graph Neural Network Computation Via Dense Tensor Core on GPUs

Recently, graph neural networks (GNNs), as the backbone of graph-based m...
research
06/23/2021

APNN-TC: Accelerating Arbitrary Precision Neural Networks on Ampere GPU Tensor Cores

Over the years, accelerating neural networks with quantization has been ...
research
09/12/2021

Accelerating GPU-Based Out-of-Core Stencil Computation with On-the-Fly Compression

Stencil computation is an important class of scientific applications tha...
research
12/28/2021

HiKonv: High Throughput Quantized Convolution With Novel Bit-wise Management and Computation

Quantization for Convolutional Neural Network (CNN) has shown significan...
research
05/27/2017

BMXNet: An Open-Source Binary Neural Network Implementation Based on MXNet

Binary Neural Networks (BNNs) can drastically reduce memory size and acc...
research
08/26/2020

Exploring the Design Space of Static and Incremental Graph Connectivity Algorithms on GPUs

Connected components and spanning forest are fundamental graph algorithm...
research
09/23/2022

Faith: An Efficient Framework for Transformer Verification on GPUs

Transformer verification draws increasing attention in machine learning ...

Please sign up or login with your details

Forgot password? Click here to reset