DBCSR: A Library for Dense Matrix Multiplications on Distributed GPU-Accelerated Systems

10/10/2019
by   Ilia Sivkov, et al.
0

Most, if not all the modern scientific simulation packages utilize matrix algebra operations. Among the operation of the linear algebra, one of the most important kernels is the multiplication of matrices, dense and sparse. Examples of application of such a kernel are in electronic structure calculations, machine learning, data mining, graph processing, and digital signal processing. Several optimized libraries exist that can achieve high-performance on distributed systems. Only a few of them target distributed GPU-accelerated systems. In most of the cases, these libraries are provided and optimized by system vendors for their specific computer systems. In this paper, we present the DBCSR library (Distributed Block Compressed Sparse Row) for the distributed dense matrix-matrix multiplications. Although the library is specifically designed for block-sparse matrix-matrix multiplications, we optimized it for the dense case on GPU-accelerated systems. We show that the DBCSR outperforms the multiplication of matrices of different sizes and shapes provided by a vendor optimized GPU version of the ScaLAPACK library up to 2.5x (1.4x on average).

READ FULL TEXT
research
10/29/2019

DBCSR: A Blocked Sparse Tensor Algebra Library

Advanced algorithms for large-scale electronic structure calculations ar...
research
11/16/2020

Indirection Stream Semantic Register Architecture for Efficient Sparse-Dense Linear Algebra

Sparse-dense linear algebra is crucial in many domains, but challenging ...
research
02/21/2019

The BLAS API of BLASFEO: optimizing performance for small matrices

BLASFEO is a dense linear algebra library providing high-performance imp...
research
06/19/2018

A model-driven approach for a new generation of adaptive libraries

Efficient high-performance libraries often expose multiple tunable param...
research
03/15/2023

A Two-level GPU-Accelerated Incomplete LU Preconditioner for General Sparse Linear Systems

This paper presents a parallel preconditioning approach based on incompl...
research
05/09/2023

Sparse Stream Semantic Registers: A Lightweight ISA Extension Accelerating General Sparse Linear Algebra

Sparse linear algebra is crucial in many application domains, but challe...
research
03/29/2023

PopSparse: Accelerated block sparse matrix multiplication on IPU

Reducing the computational cost of running large scale neural networks u...

Please sign up or login with your details

Forgot password? Click here to reset