DistDGL: Distributed Graph Neural Network Training for Billion-Scale Graphs

10/11/2020
by   Da Zheng, et al.
1

Graph neural networks (GNN) have shown great success in learning from graph-structured data. They are widely used in various applications, such as recommendation, fraud detection, and search. In these domains, the graphs are typically large, containing hundreds of millions of nodes and several billions of edges. To tackle this challenge, we develop DistDGL, a system for training GNNs in a mini-batch fashion on a cluster of machines. DistDGL is based on the Deep Graph Library (DGL), a popular GNN development framework. DistDGL distributes the graph and its associated data (initial features and embeddings) across the machines and uses this distribution to derive a computational decomposition by following an owner-compute rule. DistDGL follows a synchronous training approach and allows ego-networks forming the mini-batches to include non-local nodes. To minimize the overheads associated with distributed computations, DistDGL uses a high-quality and light-weight min-cut graph partitioning algorithm along with multiple balancing constraints. This allows it to reduce communication overheads and statically balance the computations. It further reduces the communication by replicating halo nodes and by using sparse embedding updates. The combination of these design choices allows DistDGL to train high-quality models while achieving high parallel efficiency and memory scalability. We demonstrate our optimizations on both inductive and transductive GNN models. Our results show that DistDGL achieves linear speedup without compromising model accuracy and requires only 13 seconds to complete a training epoch for a graph with 100 million nodes and 3 billion edges on a cluster with 16 machines.

READ FULL TEXT
research
12/31/2021

Distributed Hybrid CPU and GPU training for Graph Neural Networks on Billion-Scale Graphs

Graph neural networks (GNN) have shown great success in learning from gr...
research
11/11/2022

DistGNN-MB: Distributed Large-Scale Graph Neural Network Training on x86 via Minibatch Sampling

Training Graph Neural Networks, on graphs containing billions of vertice...
research
08/29/2023

An Experimental Comparison of Partitioning Strategies for Distributed Graph Neural Network Training

Recently, graph neural networks (GNNs) have gained much attention as a g...
research
04/14/2021

DistGNN: Scalable Distributed Training for Large-Scale Graph Neural Networks

Full-batch training on Graph Neural Networks (GNN) to learn the structur...
research
08/06/2023

Communication-Free Distributed GNN Training with Vertex Cut

Training Graph Neural Networks (GNNs) on real-world graphs consisting of...
research
09/14/2022

Tuple Packing: Efficient Batching of Small Graphs in Graph Neural Networks

When processing a batch of graphs in machine learning models such as Gra...
research
05/10/2022

SmartSAGE: Training Large-scale Graph Neural Networks using In-Storage Processing Architectures

Graph neural networks (GNNs) can extract features by learning both the r...

Please sign up or login with your details

Forgot password? Click here to reset