GraphCage: Cache Aware Graph Processing on GPUs

04/03/2019
by   Xuhao Chen, et al.
0

Efficient Graph processing is challenging because of the irregularity of graph algorithms. Using GPUs to accelerate irregular graph algorithms is even more difficult to be efficient, since GPU's highly structured SIMT architecture is not a natural fit for irregular applications. With lots of previous efforts spent on subtly mapping graph algorithms onto the GPU, the performance of graph processing on GPUs is still highly memory-latency bound, leading to low utilization of compute resources. Random memory accesses generated by the sparse graph data structure are the major causes of this significant memory access latency. Simply applying the conventional cache blocking technique proposed for matrix computation have limited benefit due to the significant overhead on the GPU. We propose GraphCage, a cache centric optimization framework for highly efficient graph processing on GPUs. We first present a throughput-oriented cache blocking scheme (TOCAB) in both push and pull directions. Comparing with conventional cache blocking which suffers repeated accesses when processing large graphs on GPUs, TOCAB is specifically optimized for the GPU architecture to reduce this overhead and improve memory access efficiency. To integrate our scheme into state-of-the-art implementations without significant overhead, we coordinate TOCAB with load balancing strategies by considering the sparsity of subgraphs. To enable cache blocking for traversal-based algorithms, we consider the benefit and overhead in different iterations with different working set sizes, and apply TOCAB for topology-driven kernels in pull direction. Evaluation shows that GraphCage can improve performance by 2 4x compared to hand optimized implementations and state-of-the-art frameworks (e.g. CuSha and Gunrock), with less memory consumption than CuSha.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/12/2020

EMOGI: Efficient Memory-access for Out-of-memory Graph-traversal In GPUs

Modern analytics and recommendation systems are increasingly based on gr...
research
12/08/2022

Efficient Strategies for Graph Pattern Mining Algorithms on GPUs

Graph Pattern Mining (GPM) is an important, rapidly evolving, and comput...
research
07/26/2019

Massively Scaling Seismic Processing on Sunway TaihuLight Supercomputer

Common Midpoint (CMP) and Common Reflection Surface (CRS) are widely use...
research
09/09/2022

PGAbB: A Block-Based Graph Processing Framework for Heterogeneous Platforms

Designing flexible graph kernels that can run well on various platforms ...
research
03/09/2023

GPU-enabled Function-as-a-Service for Machine Learning Inference

Function-as-a-Service (FaaS) is emerging as an important cloud computing...
research
07/06/2017

Cooperative Kernels: GPU Multitasking for Blocking Algorithms (Extended Version)

There is growing interest in accelerating irregular data-parallel algori...
research
07/14/2019

A Versatile Software Systolic Execution Model for GPU Memory-Bound Kernels

This paper proposes a versatile high-performance execution model, inspir...

Please sign up or login with your details

Forgot password? Click here to reset