
Implementing PushPull Efficiently in GraphBLAS
We factor Beamer's pushpull, also known as directionoptimized breadth...
read it

An Adaptive Load Balancer For Graph Analytical Applications on GPUs
Load balancing graph analytics workloads on GPUs is difficult because of...
read it

GPUbased Parallel Computation Support for Stan
This paper details an extensible OpenCL framework that allows Stan to ut...
read it

Dynamic Load Balancing Strategies for Graph Applications on GPUs
Acceleration of graph applications on GPUs has found large interest due ...
read it

Korali: a HighPerformance Computing Framework for Stochastic Optimization and Bayesian Uncertainty Quantification
We present a modular, opensource, highperformance computing framework ...
read it

PriorityGraph: A Unified Programming Model for Optimizing Ordered Graph Algorithms
Many graph problems can be solved using ordered parallel graph algorithm...
read it

A mechanism for balancing accuracy and scope in crossmachine blackbox GPU performance modeling
The ability to model, analyze, and predict execution time of computation...
read it
GraphBLAST: A HighPerformance Linear Algebrabased Graph Framework on the GPU
Highperformance implementations of graph algorithms are challenging to implement on new parallel hardware such as GPUs because of three challenges: (1) the difficulty of coming up with graph building blocks, (2) load imbalance on parallel hardware, and (3) graph problems having low arithmetic intensity. To address some of these challenges, GraphBLAS is an innovative, ongoing effort by the graph analytics community to propose building blocks based on sparse linear algebra, which will allow graph algorithms to be expressed in a performant, succinct, composable and portable manner. In this paper, we examine the performance challenges of a linearalgebrabased approach to building graph frameworks and describe new design principles for overcoming these bottlenecks. Among the new design principles is exploiting input sparsity, which allows users to write graph algorithms without specifying push and pull direction. Exploiting output sparsity allows users to tell the backend which values of the output in a single vectorized computation they do not want computed. Loadbalancing is an important feature for balancing work amongst parallel workers. We describe the important loadbalancing features for handling graphs with different characteristics. The design principles described in this paper have been implemented in "GraphBLAST", the first highperformance linear algebrabased graph framework on NVIDIA GPUs that is opensource. The results show that on a single GPU, GraphBLAST has on average at least an order of magnitude speedup over previous GraphBLAS implementations SuiteSparse and GBTL, comparable performance to the fastest GPU hardwired primitives and sharedmemory graph frameworks Ligra and Gunrock, and better performance than any other GPU graph framework, while offering a simpler and more concise programming model.
READ FULL TEXT
Comments
There are no comments yet.