Simplifying Distributed Neural Network Training on Massive Graphs: Randomized Partitions Improve Model Aggregation

05/17/2023
by   Jiong Zhu, et al.
0

Distributed training of GNNs enables learning on massive graphs (e.g., social and e-commerce networks) that exceed the storage and computational capacity of a single machine. To reach performance comparable to centralized training, distributed frameworks focus on maximally recovering cross-instance node dependencies with either communication across instances or periodic fallback to centralized training, which create overhead and limit the framework scalability. In this work, we present a simplified framework for distributed GNN training that does not rely on the aforementioned costly operations, and has improved scalability, convergence speed and performance over the state-of-the-art approaches. Specifically, our framework (1) assembles independent trainers, each of which asynchronously learns a local model on locally-available parts of the training graph, and (2) only conducts periodic (time-based) model aggregation to synchronize the local models. Backed by our theoretical analysis, instead of maximizing the recovery of cross-instance node dependencies – which has been considered the key behind closing the performance gap between model aggregation and centralized training – , our framework leverages randomized assignment of nodes or super-nodes (i.e., collections of original nodes) to partition the training graph such that it improves data uniformity and minimizes the discrepancy of gradient and loss function across instances. In our experiments on social and e-commerce networks with up to 1.3 billion edges, our proposed RandomTMA and SuperTMA approaches – despite using less training data – achieve state-of-the-art performance and 2.31x speedup compared to the fastest baseline, and show better robustness to trainer failures.

READ FULL TEXT

page 3

page 8

research
11/16/2021

Learn Locally, Correct Globally: A Distributed Algorithm for Training Graph Neural Networks

Despite the recent success of Graph Neural Networks (GNNs), training GNN...
research
08/06/2023

Communication-Free Distributed GNN Training with Vertex Cut

Training Graph Neural Networks (GNNs) on real-world graphs consisting of...
research
05/31/2022

Distributed Graph Neural Network Training with Periodic Historical Embedding Synchronization

Despite the recent success of Graph Neural Networks (GNNs), it remains c...
research
06/02/2023

Adaptive Message Quantization and Parallelization for Distributed Full-graph GNN Training

Distributed full-graph training of Graph Neural Networks (GNNs) over lar...
research
02/25/2023

RETEXO: Scalable Neural Network Training over Distributed Graphs

Graph neural networks offer a promising approach to supervised learning ...
research
05/11/2022

Libra: In-network Gradient Aggregation for Speeding up Distributed Sparse Deep Training

Distributed sparse deep learning has been widely used in many internet-s...

Please sign up or login with your details

Forgot password? Click here to reset