Accelerating Sampling and Aggregation Operations in GNN Frameworks with GPU Initiated Direct Storage Accesses

06/28/2023
by   Jeongmin Brian Park, et al.
0

Graph Neural Networks (GNNs) are emerging as a powerful tool for learning from graph-structured data and performing sophisticated inference tasks in various application domains. Although GNNs have been shown to be effective on modest-sized graphs, training them on large-scale graphs remains a significant challenge due to lack of efficient data access and data movement methods. Existing frameworks for training GNNs use CPUs for graph sampling and feature aggregation, while the training and updating of model weights are executed on GPUs. However, our in-depth profiling shows the CPUs cannot achieve the throughput required to saturate GNN model training throughput, causing gross under-utilization of expensive GPU resources. Furthermore, when the graph and its embeddings do not fit in the CPU memory, the overhead introduced by the operating system, say for handling page-faults, comes in the critical path of execution. To address these issues, we propose the GPU Initiated Direct Storage Access (GIDS) dataloader, to enable GPU-oriented GNN training for large-scale graphs while efficiently utilizing all hardware resources, such as CPU memory, storage, and GPU memory with a hybrid data placement strategy. By enabling GPU threads to fetch feature vectors directly from storage, GIDS dataloader solves the memory capacity problem for GPU-oriented GNN training. Moreover, GIDS dataloader leverages GPU parallelism to tolerate storage latency and eliminates expensive page-fault overhead. Doing so enables us to design novel optimizations for exploiting locality and increasing effective bandwidth for GNN training. Our evaluation using a single GPU on terabyte-scale GNN datasets shows that GIDS dataloader accelerates the overall DGL GNN training pipeline by up to 392X when compared to the current, state-of-the-art DGL dataloader.

READ FULL TEXT
research
01/20/2021

PyTorch-Direct: Enabling GPU Centric Data Access for Very Large Graph Neural Network Training with Irregular Accesses

With the increasing adoption of graph neural networks (GNNs) in the mach...
research
08/19/2022

Ginex: SSD-enabled Billion-scale Graph Neural Network Training on a Single Machine via Provably Optimal In-memory Caching

Recently, Graph Neural Networks (GNNs) have been receiving a spotlight a...
research
01/18/2023

ReFresh: Reducing Memory Access from Exploiting Stable Historical Embeddings for Graph Neural Network Training

A key performance bottleneck when training graph neural network (GNN) mo...
research
05/10/2022

SmartSAGE: Training Large-scale Graph Neural Networks using In-Storage Processing Architectures

Graph neural networks (GNNs) can extract features by learning both the r...
research
05/18/2023

Quiver: Supporting GPUs for Low-Latency, High-Throughput GNN Serving with Workload Awareness

Systems for serving inference requests on graph neural networks (GNN) mu...
research
11/10/2021

Graph Neural Network Training with Data Tiering

Graph Neural Networks (GNNs) have shown success in learning from graph-s...
research
07/19/2021

ZIPPER: Exploiting Tile- and Operator-level Parallelism for General and Scalable Graph Neural Network Acceleration

Graph neural networks (GNNs) start to gain momentum after showing signif...

Please sign up or login with your details

Forgot password? Click here to reset