Optimization of the Sparse Multi-Threaded Cholesky Factorization for A64FX

02/18/2022
by   Valentin Le Fèvre, et al.
0

Sparse linear algebra routines are fundamental building blocks of a large variety of scientific applications. Direct solvers, which are methods for solving linear systems via the factorization of matrices into products of triangular matrices, are commonly used in many contexts. The Cholesky factorization is the fastest direct method for symmetric and definite positive matrices. This paper presents selective nesting, a method to determine the optimal task granularity for the parallel Cholesky factorization based on the structure of sparse matrices. We propose the OPT-D-COST algorithm, which automatically and dynamically applies selective nesting. OPT-D-COST leverages matrix sparsity to drive complex task-based parallel workloads in the context of direct solvers. We run an extensive evaluation campaign considering a heterogeneous set of 60 sparse matrices and a parallel machine featuring the A64FX processor. OPT-D-COST delivers an average performance speedup of 1.46× with respect to the best state-of-the-art parallel method to run direct solvers.

READ FULL TEXT

page 5

page 11

research
10/13/2017

On Parallel Solution of Sparse Triangular Linear Systems in CUDA

The acceleration of sparse matrix computations on modern many-core proce...
research
05/08/2023

Parallel Cholesky Factorization for Banded Matrices using OpenMP Tasks

Cholesky factorization is a widely used method for solving linear system...
research
07/29/2016

An Asynchronous Task-based Fan-Both Sparse Cholesky Solver

Systems of linear equations arise at the heart of many scientific and en...
research
06/17/2019

Deep Learning of Preconditioners for Conjugate Gradient Solvers in Urban Water Related Problems

Solving systems of linear equations is a problem occuring frequently in ...
research
10/12/2020

On the Parallel I/O Optimality of Linear Algebra Kernels: Near-Optimal LU Factorization

Dense linear algebra kernels, such as linear solvers or tensor contracti...
research
09/25/2015

Analysis of A Splitting Approach for the Parallel Solution of Linear Systems on GPU Cards

We discuss an approach for solving sparse or dense banded linear systems...
research
01/23/2019

Parallelization and scalability analysis of inverse factorization using the Chunks and Tasks programming model

We present three methods for distributed memory parallel inverse factori...

Please sign up or login with your details

Forgot password? Click here to reset