
Implementing PushPull Efficiently in GraphBLAS
We factor Beamer's pushpull, also known as directionoptimized breadth...
read it

Compilation Techniques for Graph Algorithms on GPUs
The performance of graph programs depends highly on the algorithm, the s...
read it

An Adaptive Load Balancer For Graph Analytical Applications on GPUs
Load balancing graph analytics workloads on GPUs is difficult because of...
read it

GPUbased Parallel Computation Support for Stan
This paper details an extensible OpenCL framework that allows Stan to ut...
read it

Dynamic Load Balancing Strategies for Graph Applications on GPUs
Acceleration of graph applications on GPUs has found large interest due ...
read it

clusterNOR: A NUMAOptimized Clustering Framework
Clustering algorithms are iterative and have complex data access pattern...
read it

AutoDifferentiating Linear Algebra
Development systems for deep learning, such as Theano, Torch, TensorFlow...
read it
GraphBLAST: A HighPerformance Linear Algebrabased Graph Framework on the GPU
Highperformance implementations of graph algorithms are challenging to implement on new parallel hardware such as GPUs because of three challenges: (1) the difficulty of coming up with graph building blocks, (2) load imbalance on parallel hardware, and (3) graph problems having low arithmetic intensity. To address some of these challenges, GraphBLAS is an innovative, ongoing effort by the graph analytics community to propose building blocks based on sparse linear algebra, which will allow graph algorithms to be expressed in a performant, succinct, composable and portable manner. In this paper, we examine the performance challenges of a linearalgebrabased approach to building graph frameworks and describe new design principles for overcoming these bottlenecks. Among the new design principles is exploiting input sparsity, which allows users to write graph algorithms without specifying push and pull direction. Exploiting output sparsity allows users to tell the backend which values of the output in a single vectorized computation they do not want computed. Loadbalancing is an important feature for balancing work amongst parallel workers. We describe the important loadbalancing features for handling graphs with different characteristics. The design principles described in this paper have been implemented in "GraphBLAST", the first highperformance linear algebrabased graph framework on NVIDIA GPUs that is opensource. The results show that on a single GPU, GraphBLAST has on average at least an order of magnitude speedup over previous GraphBLAS implementations SuiteSparse and GBTL, comparable performance to the fastest GPU hardwired primitives and sharedmemory graph frameworks Ligra and Gunrock, and better performance than any other GPU graph framework, while offering a simpler and more concise programming model.
READ FULL TEXT
Comments
There are no comments yet.