Adaptive Message Quantization and Parallelization for Distributed Full-graph GNN Training

06/02/2023
by   Borui Wan, et al.
0

Distributed full-graph training of Graph Neural Networks (GNNs) over large graphs is bandwidth-demanding and time-consuming. Frequent exchanges of node features, embeddings and embedding gradients (all referred to as messages) across devices bring significant communication overhead for nodes with remote neighbors on other devices (marginal nodes) and unnecessary waiting time for nodes without remote neighbors (central nodes) in the training graph. This paper proposes an efficient GNN training system, AdaQP, to expedite distributed full-graph GNN training. We stochastically quantize messages transferred across devices to lower-precision integers for communication traffic reduction and advocate communication-computation parallelization between marginal nodes and central nodes. We provide theoretical analysis to prove fast training convergence (at the rate of O(T^-1) with T being the total number of training epochs) and design an adaptive quantization bit-width assignment scheme for each message based on the analysis, targeting a good trade-off between training convergence and efficiency. Extensive experiments on mainstream graph datasets show that AdaQP substantially improves distributed full-graph training's throughput (up to 3.01 X) with negligible accuracy drop (at most 0.30 accuracy improvement (up to 0.19 advantages over the state-of-the-art works.

READ FULL TEXT
research
03/02/2023

Boosting Distributed Full-graph GNN Training with Asynchronous One-bit Communication

Training Graph Neural Networks (GNNs) on large graphs is challenging due...
research
05/31/2022

Distributed Graph Neural Network Training with Periodic Historical Embedding Synchronization

Despite the recent success of Graph Neural Networks (GNNs), it remains c...
research
08/29/2023

Low-bit Quantization for Deep Graph Neural Networks with Smoothness-aware Message Propagation

Graph Neural Network (GNN) training and inference involve significant ch...
research
11/16/2021

Learn Locally, Correct Globally: A Distributed Algorithm for Training Graph Neural Networks

Despite the recent success of Graph Neural Networks (GNNs), training GNN...
research
11/01/2019

On Distributed Quantization for Classification

We consider the problem of distributed feature quantization, where the g...
research
05/17/2023

Simplifying Distributed Neural Network Training on Massive Graphs: Randomized Partitions Improve Model Aggregation

Distributed training of GNNs enables learning on massive graphs (e.g., s...
research
09/06/2022

Rethinking The Memory Staleness Problem In Dynamics GNN

The staleness problem is a well-known problem when working with dynamic ...

Please sign up or login with your details

Forgot password? Click here to reset