A model-driven approach for a new generation of adaptive libraries

06/19/2018
by   Marco Cianfriglia, et al.
0

Efficient high-performance libraries often expose multiple tunable parameters to provide highly optimized routines. These can range from simple loop unroll factors or vector sizes all the way to algorithmic changes, given that some implementations can be more suitable for certain devices by exploiting hardware characteristics such as local memories and vector units. Traditionally, such parameters and algorithmic choices are tuned and then hard-coded for a specific architecture and for certain characteristics of the inputs. However, emerging applications are often data-driven, thus traditional approaches are not effective across the wide range of inputs and architectures used in practice. In this paper, we present a new adaptive framework for data-driven applications which uses a predictive model to select the optimal algorithmic parameters by training with synthetic and real datasets. We demonstrate the effectiveness of a BLAS library and specifically on its matrix multiplication routine. We present experimental results for two GPU architectures and show significant performance gains of up to 3x (on a high-end NVIDIA Pascal GPU) and 2.5x (on an embedded ARM Mali GPU) when compared to a traditionally optimized library.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/15/2018

Input-Aware Auto-Tuning of Compute-Bound HPC Kernels

Efficient implementations of HPC applications for parallel architectures...
research
12/09/2022

Towards a learning-based performance modeling for accelerating Deep Neural Networks

Emerging applications such as Deep Learning are often data-driven, thus ...
research
10/10/2019

DBCSR: A Library for Dense Matrix Multiplications on Distributed GPU-Accelerated Systems

Most, if not all the modern scientific simulation packages utilize matri...
research
06/20/2019

Program Generation for Linear Algebra Using Multiple Layers of DSLs

Numerical software in computational science and engineering often relies...
research
02/11/2020

AnySeq: A High Performance Sequence Alignment Library based on Partial Evaluation

Sequence alignments are fundamental to bioinformatics which has resulted...
research
01/11/2022

High Throughput Multidimensional Tridiagonal Systems Solvers on FPGAs

We present a design space exploration for synthesizing optimized, high-t...
research
05/06/2016

A Graph-based Model for GPU Caching Problems

Modeling data sharing in GPU programs is a challenging task because of t...

Please sign up or login with your details

Forgot password? Click here to reset