Scalable Graph Convolutional Network Training on Distributed-Memory Systems

12/09/2022
by   Gunduz Vehbi Demirci, et al.
0

Graph Convolutional Networks (GCNs) are extensively utilized for deep learning on graphs. The large data sizes of graphs and their vertex features make scalable training algorithms and distributed memory systems necessary. Since the convolution operation on graphs induces irregular memory access patterns, designing a memory- and communication-efficient parallel algorithm for GCN training poses unique challenges. We propose a highly parallel training algorithm that scales to large processor counts. In our solution, the large adjacency and vertex-feature matrices are partitioned among processors. We exploit the vertex-partitioning of the graph to use non-blocking point-to-point communication operations between processors for better scalability. To further minimize the parallelization overheads, we introduce a sparse matrix partitioning scheme based on a hypergraph partitioning model for full-batch training. We also propose a novel stochastic hypergraph model to encode the expected communication volume in mini-batch training. We show the merits of the hypergraph model, previously unexplored for GCN training, over the standard graph partitioning model which does not accurately encode the communication costs. Experiments performed on real-world graph datasets demonstrate that the proposed algorithms achieve considerable speedups over alternative solutions. The optimizations achieved on communication costs become even more pronounced at high scalability with many processors. The performance benefits are preserved in deeper GCNs having more layers as well as on billion-scale graphs.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/28/2018

Accurate, Efficient and Scalable Graph Embedding

The Graph Convolutional Network (GCN) model and its variants are powerfu...
research
04/23/2021

Partitioning sparse deep neural networks for scalable training and inference

The state-of-the-art deep neural networks (DNNs) have significant comput...
research
05/02/2021

Sphynx: a parallel multi-GPU graph partitioner for distributed-memory systems

Graph partitioning has been an important tool to partition the work amon...
research
12/09/2020

Distributed Training of Graph Convolutional Networks using Subgraph Approximation

Modern machine learning techniques are successfully being adapted to dat...
research
08/30/2013

A Hypergraph-Partitioned Vertex Programming Approach for Large-scale Consensus Optimization

In modern data science problems, techniques for extracting value from bi...
research
11/01/2021

GCNear: A Hybrid Architecture for Efficient GCN Training with Near-Memory Processing

Recently, Graph Convolutional Networks (GCNs) have become state-of-the-a...

Please sign up or login with your details

Forgot password? Click here to reset