Accelerating Sparse Approximate Matrix Multiplication on GPUs

03/24/2021
by   Xiaoyan Liu, et al.
0

Although the matrix multiplication plays a vital role in computational linear algebra, there are few efficient solutions for matrix multiplication of the near-sparse matrices. The Sparse Approximate Matrix Multiply (SpAMM) is one of the algorithms to fill the performance gap neglected by traditional optimizations for dense/sparse matrix multiplication. However, existing SpAMM algorithms fail to exploit the performance potential of GPUs for acceleration. In this paper, we present cuSpAMM, the first parallel SpAMM algorithm optimized for multiple GPUs. Several performance optimizations have been proposed, including algorithm re-design to adapt to the thread parallelism, blocking strategies for memory access optimization, and the acceleration with the tensor core. In addition, we scale cuSpAMM to run on multiple GPUs with an effective load balance scheme. We evaluate cuSpAMM on both synthesized and real-world datasets on multiple GPUs. The experiment results show that cuSpAMM achieves significant performance speedup compared to vendor optimized cuBLAS and cuSPARSE libraries.

READ FULL TEXT
POST COMMENT

Comments

There are no comments yet.

Authors

page 5

09/29/2020

Accelerating Sparse Matrix-Matrix Multiplication with GPU Tensor Cores

Sparse general matrix-matrix multiplication (spGEMM) is an essential com...
11/13/2015

Large Scale Artificial Neural Network Training Using Multi-GPUs

This paper describes a method for accelerating large scale Artificial Ne...
05/29/2020

Efficient Sparse-Dense Matrix-Matrix Multiplication on GPUs Using the Customized Sparse Storage Format

Multiplication of a sparse matrix to a dense matrix (SpDM) is widely use...
08/24/2018

Implementing Strassen's Algorithm with CUTLASS on NVIDIA Volta GPUs

Conventional GPU implementations of Strassen's algorithm (Strassen) typi...
02/26/2020

Bandwidth-Optimized Parallel Algorithms for Sparse Matrix-Matrix Multiplication using Propagation Blocking

Sparse matrix-matrix multiplication (SpGEMM) is a widely used kernel in ...
09/29/2021

Accelerating Encrypted Computing on Intel GPUs

Homomorphic Encryption (HE) is an emerging encryption scheme that allows...
05/08/2019

Performance Engineering for a Tall Skinny Matrix Multiplication Kernel on GPUs

General matrix-matrix multiplications (GEMM) in vendor-supplied BLAS lib...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.