Blocking Techniques for Sparse Matrix Multiplication on Tensor Accelerators

02/11/2022
by   Paolo Sylos Labini, et al.
0

Tensor accelerators have gained popularity because they provide a cheap and efficient solution for speeding up computational-expensive tasks in Deep Learning and, more recently, in other Scientific Computing applications. However, since their features are specifically designed for tensor algebra (typically dense matrix-product), it is commonly assumed that they are not suitable for applications with sparse data. To challenge this viewpoint, we discuss methods and present solutions for accelerating sparse matrix multiplication on such architectures. In particular, we present a 1-dimensional blocking algorithm with theoretical guarantees on the density, which builds dense blocks from arbitrary sparse matrices. Experimental results show that, even for unstructured and highly-sparse matrices, our block-based solution which exploits Nvidia Tensor Cores is faster than its sparse counterpart. We observed significant speed-ups of up to two orders of magnitude on real-world sparse matrices.

READ FULL TEXT

page 5

page 8

page 10

research
09/29/2020

Accelerating Sparse Matrix-Matrix Multiplication with GPU Tensor Cores

Sparse general matrix-matrix multiplication (spGEMM) is an essential com...
research
08/19/2019

A Computational Model for Tensor Core Units

To respond to the need of efficient training and inference of deep neura...
research
03/07/2022

Recovering single precision accuracy from Tensor Cores while surpassing the FP32 theoretical peak performance

Tensor Core is a mixed-precision matrix-matrix multiplication unit on NV...
research
10/08/2021

Characterizing and Demystifying the Implicit Convolution Algorithm on Commercial Matrix-Multiplication Accelerators

Many of today's deep neural network accelerators, e.g., Google's TPU and...
research
12/07/2020

SGD_Tucker: A Novel Stochastic Optimization Strategy for Parallel Sparse Tucker Decomposition

Sparse Tucker Decomposition (STD) algorithms learn a core tensor and a g...
research
03/27/2023

Maple: A Processing Element for Row-Wise Product Based Sparse Tensor Accelerators

Sparse tensor computing is a core computational part of numerous applica...
research
11/24/2018

Accelerating Reduction and Scan Using Tensor Core Units

Driven by deep learning, there has been a surge of specialized processor...

Please sign up or login with your details

Forgot password? Click here to reset