IAAT: A Input-Aware Adaptive Tuning framework for Small GEMM

08/21/2022
by   Jianyu Yao, et al.
0

GEMM with the small size of input matrices is becoming widely used in many fields like HPC and machine learning. Although many famous BLAS libraries already supported small GEMM, they cannot achieve near-optimal performance. This is because the costs of pack operations are high and frequent boundary processing cannot be neglected. This paper proposes an input-aware adaptive tuning framework(IAAT) for small GEMM to overcome the performance bottlenecks in state-of-the-art implementations. IAAT consists of two stages, the install-time stage and the run-time stage. In the run-time stage, IAAT tiles matrices into blocks to alleviate boundary processing. This stage utilizes an input-aware adaptive tile algorithm and plays the role of runtime tuning. In the install-time stage, IAAT auto-generates hundreds of kernels of different sizes to remove pack operations. Finally, IAAT finishes the computation of small GEMM by invoking different kernels, which corresponds to the size of blocks. The experimental results show that IAAT gains better performance than other BLAS libraries on ARMv8 platform.

READ FULL TEXT
research
02/15/2018

Input-Aware Auto-Tuning of Compute-Bound HPC Kernels

Efficient implementations of HPC applications for parallel architectures...
research
09/13/2019

AITuning: Machine Learning-based Tuning Tool for Run-Time Communication Libraries

In this work, we address the problem of tuning communication libraries b...
research
02/11/2023

Auto-SpMV: Automated Optimizing SpMV Kernels on GPU

Sparse matrix-vector multiplication (SpMV) is an essential linear algebr...
research
11/17/2019

Adaptive Learning Guidance System (ALGS)

This poster presents the conceptual framework of the Adaptive Learning G...
research
03/15/2020

Towards automated kernel selection in machine learning systems: A SYCL case study

Automated tuning of compute kernels is a popular area of research, mainl...
research
08/30/2020

Performance portability through machine learning guided kernel selection in SYCL libraries

Automatically tuning parallel compute kernels allows libraries and frame...
research
02/18/2022

SKaMPI-OpenSHMEM: Measuring OpenSHMEM Communication Routines

Benchmarking is an important challenge in HPC, in particular, to be able...

Please sign up or login with your details

Forgot password? Click here to reset