Optimizing Block-Sparse Matrix Multiplications on CUDA with TVM

07/26/2020
by   Zijing Gu, et al.
0

We implemented and optimized matrix multiplications between dense and block-sparse matrices on CUDA. We leveraged TVM, a deep learning compiler, to explore the schedule space of the operation and generate efficient CUDA code. With the automatic parameter tuning in TVM, our cross-thread reduction based implementation achieved competitive or better performance compared with other state-of-the-art frameworks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/04/2023

SpComp: A Sparsity Structure-Specific Compilation of Matrix Operations

Sparse matrix operations involve a large number of zero operands which m...
research
05/07/2020

TIRAMISU: A Polyhedral Compiler for Dense and Sparse Deep Learning

In this paper, we demonstrate a compiler that can optimize sparse and re...
research
12/10/2022

Optimized Sparse Matrix Operations for Reverse Mode Automatic Differentiation

Sparse matrix representations are ubiquitous in computational science an...
research
10/12/2018

Expressing Sparse Matrix Computations for Productive Performance on Spatial Architectures

This paper addresses spatial programming of sparse matrix computations f...
research
04/28/2022

Programming Matrices as Staged Sparse Rows to Generate Efficient Matrix-free Differential Equation Solver

Solving differential equations is a critical task in scientific computin...
research
10/27/2022

Bootstrapped Block Lanczos for large-dimension eigenvalue problems

The Lanczos algorithm has proven itself to be a valuable matrix eigensol...
research
09/10/2019

Efficient Interleaved Batch Matrix Solvers for CUDA

In this paper we present a new methodology for data accesses when solvin...

Please sign up or login with your details

Forgot password? Click here to reset