AutoTSMM: An Auto-tuning Framework for Building High-Performance Tall-and-Skinny Matrix-Matrix Multiplication on CPUs

08/17/2022
by   Chendi Li, et al.
0

In recent years, general matrix-matrix multiplication with non-regular-shaped input matrices has been widely used in many applications like deep learning and has drawn more and more attention. However, conventional implementations are not suited for non-regular-shaped matrix-matrix multiplications, and few works focus on optimizing tall-and-skinny matrix-matrix multiplication on CPUs. This paper proposes an auto-tuning framework, AutoTSMM, to build high-performance tall-and-skinny matrix-matrix multiplication. AutoTSMM selects the optimal inner kernels in the install-time stage and generates an execution plan for the pre-pack tall-and-skinny matrix-matrix multiplication in the runtime stage. Experiments demonstrate that AutoTSMM achieves competitive performance comparing to state-of-the-art tall-and-skinny matrix-matrix multiplication. And, it outperforms all conventional matrix-matrix multiplication implementations.

READ FULL TEXT

page 1

page 2

research
07/16/2023

New Bounds for Matrix Multiplication: from Alpha to Omega

The main contribution of this paper is a new improved variant of the las...
research
08/11/2022

Optimizing Irregular-Shaped Matrix-Matrix Multiplication on Multi-Core DSPs

General Matrix Multiplication (GEMM) has a wide range of applications in...
research
02/09/2020

ISM2: Optimizing Irregular-Shaped Matrix-Matrix Multiplication on GPUs

Linear algebra operations have been widely used in big data analytics an...
research
03/19/2017

CLTune: A Generic Auto-Tuner for OpenCL Kernels

This work presents CLTune, an auto-tuner for OpenCL kernels. It evaluate...
research
06/27/2020

JAMPI: efficient matrix multiplication in Spark using Barrier Execution Mode

The new barrier mode in Apache Spark allows embedding distributed deep l...
research
05/12/2023

AMULET: Adaptive Matrix-Multiplication-Like Tasks

Many useful tasks in data science and machine learning applications can ...
research
09/01/2016

BLISlab: A Sandbox for Optimizing GEMM

Matrix-matrix multiplication is a fundamental operation of great importa...

Please sign up or login with your details

Forgot password? Click here to reset