Large Graph Convolutional Network Training with GPU-Oriented Data Communication Architecture

03/04/2021
by   Seung Won Min, et al.
45

Graph Convolutional Networks (GCNs) are increasingly adopted in large-scale graph-based recommender systems. Training GCN requires the minibatch generator traversing graphs and sampling the sparsely located neighboring nodes to obtain their features. Since real-world graphs often exceed the capacity of GPU memory, current GCN training systems keep the feature table in host memory and rely on the CPU to collect sparse features before sending them to the GPUs. This approach, however, puts tremendous pressure on host memory bandwidth and the CPU. This is because the CPU needs to (1) read sparse features from memory, (2) write features into memory as a dense format, and (3) transfer the features from memory to the GPUs. In this work, we propose a novel GPU-oriented data communication approach for GCN training, where GPU threads directly access sparse features in host memory through zero-copy accesses without much CPU help. By removing the CPU gathering stage, our method significantly reduces the consumption of the host resources and data access latency. We further present two important techniques to achieve high host memory access efficiency by the GPU: (1) automatic data access address alignment to maximize PCIe packet efficiency, and (2) asynchronous zero-copy access and kernel execution to fully overlap data transfer with training. We incorporate our method into PyTorch and evaluate its effectiveness using several graphs with sizes up to 111 million nodes and 1.6 billion edges. In a multi-GPU training setup, our method is 65-92 the performance of all-in-GPU-memory training for some graphs that fit in GPU memory.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/17/2021

MG-GCN: Scalable Multi-GPU GCN Training Framework

Full batch training of Graph Convolutional Network (GCN) models is not f...
research
06/12/2020

EMOGI: Efficient Memory-access for Out-of-memory Graph-traversal In GPUs

Modern analytics and recommendation systems are increasingly based on gr...
research
04/17/2021

ScaleFreeCTR: MixCache-based Distributed Training System for CTR Models with Huge Embedding Table

Because of the superior feature representation ability of deep learning,...
research
06/23/2021

Weighted Random Sampling on GPUs

An alias table is a data structure that allows for efficiently drawing w...
research
05/10/2019

Overcoming Limitations of GPGPU-Computing in Scientific Applications

The performance of discrete general purpose graphics processing units (G...
research
08/18/2019

CHoNDA: Near Data Acceleration with Concurrent Host Access

Near-data accelerators (NDAs) that are integrated with main memory have ...
research
02/19/2022

Distributed Out-of-Memory NMF of Dense and Sparse Data on CPU/GPU Architectures with Automatic Model Selection for Exascale Data

The need for efficient and scalable big-data analytics methods is more e...

Please sign up or login with your details

Forgot password? Click here to reset