ISM2: Optimizing Irregular-Shaped Matrix-Matrix Multiplication on GPUs

02/09/2020
by   Cody Rivera, et al.
0

Linear algebra operations have been widely used in big data analytics and scientific computations. Many works have been done on optimizing linear algebra operations on GPUs with regular-shaped input. However, few works are focusing on fully utilizing GPU resources when the input is not regular-shaped. Current optimizations lack of considering fully utilizing the memory bandwidth and computing power, therefore they could only achieve sub-optimal performance. In this paper, we propose two efficient irregular-shaped matrix-matrix multiplication (GEMM) algorithms on GPUs, called TSM2 and ISM2. Both of them focus on optimizing GEMMs with various input sizes where at least one of the matrices is tall-and-skinny. We implement our proposed algorithms and test on several modern Nvidia GPU micro-architectures. Experiments show that compared to state of the art, our TSM2 speeds up the computation by 1.1x 3x and improves the memory bandwidth utilization and computing power utilization by 8 and 7 or medium. Moreover, our ISM2 speeds up the GEMM by 1.1x 3.5x and improve the memory bandwidth utilization by up to 55 relatively small.

READ FULL TEXT

page 9

page 11

page 13

page 14

research
08/17/2022

AutoTSMM: An Auto-tuning Framework for Building High-Performance Tall-and-Skinny Matrix-Matrix Multiplication on CPUs

In recent years, general matrix-matrix multiplication with non-regular-s...
research
05/29/2020

Efficient Sparse-Dense Matrix-Matrix Multiplication on GPUs Using the Customized Sparse Storage Format

Multiplication of a sparse matrix to a dense matrix (SpDM) is widely use...
research
08/11/2022

Optimizing Irregular-Shaped Matrix-Matrix Multiplication on Multi-Core DSPs

General Matrix Multiplication (GEMM) has a wide range of applications in...
research
06/15/2022

OpSparse: a Highly Optimized Framework for Sparse General Matrix Multiplication on GPUs

Sparse general matrix multiplication (SpGEMM) is an important and expens...
research
07/07/2023

CODAG: Characterizing and Optimizing Decompression Algorithms for GPUs

Data compression and decompression have become vital components of big-d...
research
01/15/2018

SPIN: A Fast and Scalable Matrix Inversion Method in Apache Spark

The growth of big data in domains such as Earth Sciences, Social Network...
research
05/14/2019

Optimizing the Linear Fascicle Evaluation Algorithm for Multi-Core and Many-Core Systems

Sparse matrix-vector multiplication (SpMV) operations are commonly used ...

Please sign up or login with your details

Forgot password? Click here to reset