ReFresh: Reducing Memory Access from Exploiting Stable Historical Embeddings for Graph Neural Network Training

01/18/2023
by   Kezhao Huang, et al.
0

A key performance bottleneck when training graph neural network (GNN) models on large, real-world graphs is loading node features onto a GPU. Due to limited GPU memory, expensive data movement is necessary to facilitate the storage of these features on alternative devices with slower access (e.g. CPU memory). Moreover, the irregularity of graph structures contributes to poor data locality which further exacerbates the problem. Consequently, existing frameworks capable of efficiently training large GNN models usually incur a significant accuracy degradation because of the inevitable shortcuts involved. To address these limitations, we instead propose ReFresh, a general-purpose GNN mini-batch training framework that leverages a historical cache for storing and reusing GNN node embeddings instead of re-computing them through fetching raw features at every iteration. Critical to its success, the corresponding cache policy is designed, using a combination of gradient-based and staleness criteria, to selectively screen those embeddings which are relatively stable and can be cached, from those that need to be re-computed to reduce estimation errors and subsequent downstream accuracy loss. When paired with complementary system enhancements to support this selective historical cache, ReFresh is able to accelerate the training speed on large graph datasets such as ogbn-papers100M and MAG240M by 4.6x up to 23.6x and reduce the memory access by 64.5 test accuracy.

READ FULL TEXT
research
06/28/2023

Accelerating Sampling and Aggregation Operations in GNN Frameworks with GPU Initiated Direct Storage Accesses

Graph Neural Networks (GNNs) are emerging as a powerful tool for learnin...
research
01/20/2021

PyTorch-Direct: Enabling GPU Centric Data Access for Very Large Graph Neural Network Training with Irregular Accesses

With the increasing adoption of graph neural networks (GNNs) in the mach...
research
11/10/2021

Graph Neural Network Training with Data Tiering

Graph Neural Networks (GNNs) have shown success in learning from graph-s...
research
04/21/2021

Accelerating SpMM Kernel with Cache-First Edge Sampling for Graph Neural Networks

Graph neural networks (GNNs), an emerging deep learning model class, can...
research
08/25/2023

Staleness-Alleviated Distributed GNN Training via Online Dynamic-Embedding Prediction

Despite the recent success of Graph Neural Networks (GNNs), it remains c...
research
08/26/2021

GNNSampler: Bridging the Gap between Sampling Algorithms of GNN and Hardware

Sampling is a critical operation in the training of Graph Neural Network...
research
09/20/2023

InkStream: Real-time GNN Inference on Streaming Graphs via Incremental Update

Classic Graph Neural Network (GNN) inference approaches, designed for st...

Please sign up or login with your details

Forgot password? Click here to reset