MG-GCN: Scalable Multi-GPU GCN Training Framework

10/17/2021
by   Muhammed Fatih Balin, et al.
0

Full batch training of Graph Convolutional Network (GCN) models is not feasible on a single GPU for large graphs containing tens of millions of vertices or more. Recent work has shown that, for the graphs used in the machine learning community, communication becomes a bottleneck and scaling is blocked outside of the single machine regime. Thus, we propose MG-GCN, a multi-GPU GCN training framework taking advantage of the high-speed communication links between the GPUs present in multi-GPU systems. MG-GCN employs multiple High-Performance Computing optimizations, including efficient re-use of memory buffers to reduce the memory footprint of training GNN models, as well as communication and computation overlap. These optimizations enable execution on larger datasets, that generally do not fit into memory of a single GPU in state-of-the-art implementations. Furthermore, they contribute to achieve superior speedup compared to the state-of-the-art. For example, MG-GCN achieves super-linear speedup with respect to DGL, on the Reddit graph on both DGX-1 (V100) and DGX-A100.

READ FULL TEXT
research
10/28/2018

Accurate, Efficient and Scalable Graph Embedding

The Graph Convolutional Network (GCN) model and its variants are powerfu...
research
03/04/2021

Large Graph Convolutional Network Training with GPU-Oriented Data Communication Architecture

Graph Convolutional Networks (GCNs) are increasingly adopted in large-sc...
research
07/15/2022

Multi-node Acceleration for Large-scale GCNs

Limited by the memory capacity and compute power, singe-node graph convo...
research
11/01/2021

GCNear: A Hybrid Architecture for Efficient GCN Training with Near-Memory Processing

Recently, Graph Convolutional Networks (GCNs) have become state-of-the-a...
research
04/06/2022

Accelerating Backward Aggregation in GCN Training with Execution Path Preparing on GPUs

The emerging Graph Convolutional Network (GCN) has now been widely used ...
research
03/11/2018

Scalable Breadth-First Search on a GPU Cluster

On a GPU cluster, the ratio of high computing power to communication ban...
research
03/11/2018

Salable Breadth-First Search on a GPU Cluster

On a GPU cluster, the ratio of high computing power to communication ban...

Please sign up or login with your details

Forgot password? Click here to reset