HitGNN: High-throughput GNN Training Framework on CPU+Multi-FPGA Heterogeneous Platform

03/02/2023
by   Yi-Chien Lin, et al.
0

As the size of real-world graphs increases, training Graph Neural Networks (GNNs) has become time-consuming and requires acceleration. While previous works have demonstrated the potential of utilizing FPGA for accelerating GNN training, few works have been carried out to accelerate GNN training with multiple FPGAs due to the necessity of hardware expertise and substantial development effort. To this end, we propose HitGNN, a framework that enables users to effortlessly map GNN training workloads onto a CPU-Multi-FPGA platform for acceleration. In particular, HitGNN takes the user-defined synchronous GNN training algorithm, GNN model, and platform metadata as input, determines the design parameters based on the platform metadata, and performs hardware mapping onto the CPU+Multi-FPGA platform, automatically. HitGNN consists of the following building blocks: (1) high-level application programming interfaces (APIs) that allow users to specify various synchronous GNN training algorithms and GNN models with only a handful of lines of code; (2) a software generator that generates a host program that performs mini-batch sampling, manages CPU-FPGA communication, and handles workload balancing among the FPGAs; (3) an accelerator generator that generates GNN kernels with optimized datapath and memory organization. We show that existing synchronous GNN training algorithms such as DistDGL and PaGraph can be easily deployed on a CPU+Multi-FPGA platform using our framework, while achieving high training throughput. Compared with the state-of-the-art frameworks that accelerate synchronous GNN training on a multi-GPU platform, HitGNN achieves up to 27.21x bandwidth efficiency, and up to 4.26x speedup using much less compute power and memory bandwidth than GPUs. In addition, HitGNN demonstrates good scalability to 16 FPGAs on a CPU+Multi-FPGA platform.

READ FULL TEXT
research
12/22/2021

HP-GNN: Generating High Throughput GNN Training Implementation on CPU-FPGA Heterogeneous Platform

Graph Neural Networks (GNNs) have shown great success in many applicatio...
research
09/07/2022

Hardware Acceleration of Sampling Algorithms in Sample and Aggregate Graph Neural Networks

Sampling is an important process in many GNN structures in order to trai...
research
06/17/2022

Low-latency Mini-batch GNN Inference on CPU-FPGA Heterogeneous Platform

Mini-batch inference of Graph Neural Networks (GNNs) is a key problem in...
research
08/23/2022

SASA: A Scalable and Automatic Stencil Acceleration Framework for Optimized Hybrid Spatial and Temporal Parallelism on HBM-based FPGAs

Stencil computation is one of the fundamental computing patterns in many...
research
12/31/2021

Distributed Hybrid CPU and GPU training for Graph Neural Networks on Billion-Scale Graphs

Graph neural networks (GNN) have shown great success in learning from gr...
research
02/02/2023

GraphAGILE: An FPGA-based Overlay Accelerator for Low-latency GNN Inference

This paper presents GraphAGILE, a domain-specific FPGA-based overlay acc...
research
05/21/2021

GNNIE: GNN Inference Engine with Load-balancing and Graph-Specific Caching

Graph neural networks (GNN) analysis engines are vital for real-world pr...

Please sign up or login with your details

Forgot password? Click here to reset