Multi-node Acceleration for Large-scale GCNs

07/15/2022
by   Gongjian Sun, et al.
0

Limited by the memory capacity and compute power, singe-node graph convolutional neural network (GCN) accelerators cannot complete the execution of GCNs within a reasonable amount of time, due to the explosive size of graphs nowadays. Thus, large-scale GCNs call for a multi-node acceleration system (MultiAccSys) like TPU-Pod for large-scale neural networks. In this work, we aim to scale up single-node GCN accelerators to accelerate GCNs on large-scale graphs. We first identify the communication pattern and challenges of multi-node acceleration for GCNs on large-scale graphs. We observe that (1) coarse-grained communication patterns exist in the execution of GCNs in MultiAccSys, which introduces massive amount of redundant network transmissions and off-chip memory accesses; (2) overall, the acceleration of GCNs in MultiAccSys is bandwidth-bound and latency-tolerant. Guided by these two observations, we then propose MultiGCN, the first MultiAccSys for large-scale GCNs that trades network latency for network bandwidth. Specifically, by leveraging the network latency tolerance, we first propose a topology-aware multicast mechanism with a one put per multicast message-passing model to reduce transmissions and alleviate network bandwidth requirements. Second, we introduce a scatter-based round execution mechanism which cooperates with the multicast mechanism and reduces redundant off-chip memory accesses. Compared to the baseline MultiAccSys, MultiGCN achieves 4 12x speedup using only 28 energy, while reducing 32 average. It not only achieves 2.5 8x speedup over the state-of-the-art multi-GPU solution, but also scales to large-scale graphs as opposed to single-node GCN accelerators.

READ FULL TEXT

page 2

page 6

page 9

page 12

research
10/17/2021

MG-GCN: Scalable Multi-GPU GCN Training Framework

Full batch training of Graph Convolutional Network (GCN) models is not f...
research
05/15/2022

COIN: Communication-Aware In-Memory Acceleration for Graph Convolutional Networks

Graph convolutional networks (GCNs) have shown remarkable learning capab...
research
07/24/2023

HiHGNN: Accelerating HGNNs through Parallelism and Data Reusability Exploitation

Heterogeneous graph neural networks (HGNNs) have emerged as powerful alg...
research
07/13/2021

FLAT: An Optimized Dataflow for Mitigating Attention Performance Bottlenecks

Attention mechanisms form the backbone of state-of-the-art machine learn...
research
04/19/2023

Massive Data-Centric Parallelism in the Chiplet Era

Recent works have introduced task-based parallelization schemes to accel...
research
08/26/2021

Efficient On-Chip Communication for Parallel Graph-Analytics on Spatial Architectures

Large-scale graph processing has drawn great attention in recent years. ...
research
01/16/2018

Inter-thread Communication in Multithreaded, Reconfigurable Coarse-grain Arrays

Traditional von Neumann GPGPUs only allow threads to communicate through...

Please sign up or login with your details

Forgot password? Click here to reset