Low-latency Mini-batch GNN Inference on CPU-FPGA Heterogeneous Platform

06/17/2022
by   Bingyi Zhang, et al.
0

Mini-batch inference of Graph Neural Networks (GNNs) is a key problem in many real-world applications. Recently, a GNN design principle of model depth-receptive field decoupling has been proposed to address the well-known issue of neighborhood explosion. Decoupled GNN models achieve higher accuracy than original models and demonstrate excellent scalability for mini-batch inference. We map Decoupled GNNs onto CPU-FPGA heterogeneous platforms to achieve low-latency mini-batch inference. On the FPGA platform, we design a novel GNN hardware accelerator with an adaptive datapath denoted Adaptive Computation Kernel (ACK) that can execute various computation kernels of GNNs with low-latency: (1) for dense computation kernels expressed as matrix multiplication, ACK works as a systolic array with fully localized connections, (2) for sparse computation kernels, ACK follows the scatter-gather paradigm and works as multiple parallel pipelines to support the irregular connectivity of graphs. The proposed task scheduling hides the CPU-FPGA data communication overhead to reduce the inference latency. We develop a fast design space exploration algorithm to generate a single accelerator for multiple target GNN models. We implement our accelerator on a state-of-the-art CPU-FPGA platform and evaluate the performance using three representative models (GCN, GraphSAGE, and GAT). Results show that our CPU-FPGA implementation achieves 21.4-50.8×, 2.9-21.6×, 4.7× latency reduction compared with state-of-the-art implementations on CPU-only, CPU-GPU and CPU-FPGA platforms.

READ FULL TEXT

page 8

page 9

research
03/22/2023

Dynasparse: Accelerating GNN Inference through Dynamic Sparsity Exploitation

Graph Neural Network (GNN) inference is used in many real-world applicat...
research
02/02/2023

GraphAGILE: An FPGA-based Overlay Accelerator for Low-latency GNN Inference

This paper presents GraphAGILE, a domain-specific FPGA-based overlay acc...
research
09/28/2022

LL-GNN: Low Latency Graph Neural Networks on FPGAs for Particle Detectors

This work proposes a novel reconfigurable architecture for low latency G...
research
04/13/2021

MELOPPR: Software/Hardware Co-design for Memory-efficient Low-latency Personalized PageRank

Personalized PageRank (PPR) is a graph algorithm that evaluates the impo...
research
03/02/2023

HitGNN: High-throughput GNN Training Framework on CPU+Multi-FPGA Heterogeneous Platform

As the size of real-world graphs increases, training Graph Neural Networ...
research
04/03/2021

Adaptive Filters and Aggregator Fusion for Efficient Graph Convolutions

Training and deploying graph neural networks (GNNs) remains difficult du...
research
10/07/2022

Efficient Computation of Map-scale Continuous Mutual Information on Chip in Real Time

Exploration tasks are essential to many emerging robotics applications, ...

Please sign up or login with your details

Forgot password? Click here to reset