Sgap: Towards Efficient Sparse Tensor Algebra Compilation for GPU

09/07/2022
by   Genghan Zhang, et al.
0

Sparse compiler is a promising solution for sparse tensor algebra optimization. In compiler implementation, reduction in sparse-dense hybrid algebra plays a key role in performance. Though GPU provides various reduction semantics that can better utilize the parallel computing and memory bandwidth capacity, the central question is: how to elevate the flexible reduction semantics to sparse compilation theory that assumes serial execution. Specifically, we have to tackle two main challenges: (1) there are wasted parallelism by adopting static synchronization granularity (2) static reduction strategy limits optimization space exploration. We propose Sgap: segment group and atomic parallelism to solve these problems. Atomic parallelism captures the flexible reduction semantics to systematically analyze the optimization space of sparse-dense hybrid algebra on GPU. It is a new optimization technique beyond current compiler-based and open-source runtime libraries. Segment group elevates the flexible reduction semantics to suitable levels of abstraction in the sparse compilation theory. It adopts changeable group size and user-defined reduction strategy to solve challenge (1) and (2), respectively. Finally, we use GPU sparse matrix-matrix multiplication (SpMM) on the TACO compiler as a use case to demonstrate the effectiveness of segment group in reduction semantics elevation. We achieve up to 1.2x speedup over the original TACO's SpMM kernels. We also apply new optimization techniques found by atomic parallelism to an open-source state-of-the-art SpMM library dgSPARSE. We achieve 1.6x - 2.3x speedup on the algorithm tuned with atomic parallelism.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/27/2022

Correct Compilation of Semiring Contractions

We introduce a formal operational semantics that describes the fused exe...
research
02/28/2018

Automatic Generation of Sparse Tensor Kernels with Workspaces

Recent advances in compiler theory describe how to compile sparse tensor...
research
12/28/2019

A Unified Iteration Space Transformation Framework for Sparse and Dense Tensor Algebra

We address the problem of optimizing mixed sparse and dense tensor algeb...
research
11/07/2022

Stardust: Compiling Sparse Tensor Algebra to a Reconfigurable Dataflow Architecture

We introduce Stardust, a compiler that compiles sparse tensor algebra to...
research
04/23/2018

Unified Sparse Formats for Tensor Algebra Compilers

This paper shows how to build a sparse tensor algebra compiler that is a...
research
09/13/2023

Autotuning Apache TVM-based Scientific Applications Using Bayesian Optimization

Apache TVM (Tensor Virtual Machine), an open source machine learning com...
research
10/02/2021

Spindle: Techniques for Optimizing Atomic Multicast on RDMA

Leveraging one-sided RDMA for applications that replicate small data obj...

Please sign up or login with your details

Forgot password? Click here to reset