Co-Design of the Dense Linear AlgebravSoftware Stack for Multicore Processors

04/27/2023
by   Héctor Martínez, et al.
0

This paper advocates for an intertwined design of the dense linear algebra software stack that breaks down the strict barriers between the high-level, blocked algorithms in LAPACK (Linear Algebra PACKage) and the low-level, architecture-dependent kernels in BLAS (Basic Linear Algebra Subprograms). Specifically, we propose customizing the GEMM (general matrix multiplication) kernel, which is invoked from the blocked algorithms for relevant matrix factorizations in LAPACK, to improve performance on modern multicore processors with hierarchical cache memories. To achieve this, we leverage an analytical model to dynamically adapt the cache configuration parameters of the GEMM to the shape of the matrix operands. Additionally, we accommodate a flexible development of architecture-specific micro-kernels that allow us to further improve the utilization of the cache hierarchy. Our experiments on two platforms, equipped with ARM (NVIDIA Carmel, Neon) and x86 (AMD EPYC, AVX2) multi-core processors, demonstrate the benefits of this approach in terms of better cache utilization and, in general, higher performance. However, they also reveal the delicate balance between optimizing for multi-threaded parallelism versus cache usage.

READ FULL TEXT

page 2

page 15

page 17

page 18

research
05/19/2021

High performance and energy efficient inference for deep learning on ARM processors

We evolve PyDTNN, a framework for distributed parallel training of Deep ...
research
11/07/2020

FusedMM: A Unified SDDMM-SpMM Kernel for Graph Embedding and Graph Neural Networks

We develop a fused matrix multiplication kernel that unifies sampled den...
research
06/01/2017

Performance Modeling and Prediction for Dense Linear Algebra

This dissertation introduces measurement-based performance modeling and ...
research
05/11/2018

Towards scalable pattern-based optimization for dense linear algebra

Linear algebraic expressions are the essence of many computationally int...
research
05/09/2019

Exploiting Fine-Grain Ordered Parallelism in Dense Matrix Algorithms

Dense linear algebra kernels are critical for wireless applications, and...
research
02/07/2021

Estimate The Efficiency Of Multiprocessor's Cash Memory Work Algorithms

Many computer systems for calculating the proper organization of memory ...
research
11/06/2015

Multi-Threaded Dense Linear Algebra Libraries for Low-Power Asymmetric Multicore Processors

Dense linear algebra libraries, such as BLAS and LAPACK, provide a relev...

Please sign up or login with your details

Forgot password? Click here to reset