ButterFly BFS – An Efficient Communication Pattern for Multi Node Traversals

03/25/2021
by   Oded Green, et al.
0

Breadth-First Search (BFS) is a building block used in a wide array of graph analytics and is used in various network analysis domains: social, road, transportation, communication, and much more. Over the last two decades, network sizes have continued to grow. The popularity of BFS has brought with it a need for significantly faster traversals. Thus, BFS algorithms have been designed to exploit shared-memory and shared-nothing systems – this includes algorithms for accelerators such as the GPU. GPUs offer extremely fast traversals at the cost of processing smaller graphs due to their limited memory size. In contrast, CPU shared-memory systems can scale to graphs with several billion edges but do not have enough compute resources needed for fast traversals. This paper introduces ButterFly BFS, a multi-GPU traversal algorithm that allows analyzing significantly larger networks at high rates. ButterFly BFS scales to the similar-sized graphs processed by shared-memory systems while improving performance by more than 10X compared to CPUs. We evaluate our new algorithm on an NVIDIA DGX-2 server with 16 V100 GPUS and show that our algorithm scales with an increase in the number of GPUS. We show that we can achieve a roughly 70% performance linear speedup, which is non-trivial for BFS. For a scale 29 Kronecker graph and edge factor of 8, our new algorithm traverses the graph at a rate of over 300 GTEP/s. That is a high traversal rate for a single server.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/05/2020

MGPU-TSM: A Multi-GPU System with Truly Shared Memory

The sizes of GPU applications are rapidly growing. They are exhausting t...
research
09/16/2021

Efficient Scaling of Dynamic Graph Neural Networks

We present distributed algorithms for training dynamic Graph Neural Netw...
research
06/14/2023

GraphVine: A Data Structure to Optimize Dynamic Graph Processing on GPUs

Graph processing on GPUs is gaining momentum due to the high throughputs...
research
11/06/2017

Fast Integral Histogram Computations on GPU for Real-Time Video Analytics

In many Multimedia content analytics frameworks feature likelihood maps ...
research
10/14/2019

A High-Throughput Solver for Marginalized Graph Kernels on GPU

We present the design of a solver for the efficient and high-throughput ...
research
10/30/2020

To Push or To Pull: On Reducing Communication and Synchronization in Graph Computations

We reduce the cost of communication and synchronization in graph process...
research
08/31/2019

Implicit Hari–Zimmermann algorithm for the generalized SVD on the GPUs

A parallel, blocked, one-sided Hari–Zimmermann algorithm for the general...

Please sign up or login with your details

Forgot password? Click here to reset