A Case for Malleable Thread-Level Linear Algebra Libraries: The LU Factorization with Partial Pivoting

11/19/2016
by   Sandra Catalán, et al.
0

We propose two novel techniques for overcoming load-imbalance encountered when implementing so-called look-ahead mechanisms in relevant dense matrix factorizations for the solution of linear systems. Both techniques target the scenario where two thread teams are created/activated during the factorization, with each team in charge of performing an independent task/branch of execution. The first technique promotes worker sharing (WS) between the two tasks, allowing the threads of the task that completes first to be reallocated for use by the costlier task. The second technique allows a fast task to alert the slower task of completion, enforcing the early termination (ET) of the second task, and a smooth transition of the factorization procedure into the next iteration. The two mechanisms are instantiated via a new malleable thread-level implementation of the Basic Linear Algebra Subprograms (BLAS), and their benefits are illustrated via an implementation of the LU factorization with partial pivoting enhanced with look-ahead. Concretely, our experimental results on a six core Intel-Xeon processor show the benefits of combining WS+ET, reporting competitive performance in comparison with a task-parallel runtime-based solution.

READ FULL TEXT

page 12

page 13

page 17

research
04/19/2018

Programming Parallel Dense Matrix Factorizations with Look-Ahead and OpenMP

We investigate a parallelization strategy for dense matrix factorization...
research
01/22/2016

Task Parallel Incomplete Cholesky Factorization using 2D Partitioned-Block Layout

We introduce a task-parallel algorithm for sparse incomplete Cholesky fa...
research
12/16/2021

Large Scale Distributed Linear Algebra With Tensor Processing Units

We have repurposed Google Tensor Processing Units (TPUs), application-sp...
research
09/01/2017

Look-Ahead in the Two-Sided Reduction to Compact Band Forms for Symmetric Eigenvalue Problems and the SVD

We address the reduction to compact band forms, via unitary similarity t...
research
04/06/2023

Formal Derivation of LU Factorization with Pivoting

The FLAME methodology for deriving linear algebra algorithms from specif...
research
10/12/2020

On the Parallel I/O Optimality of Linear Algebra Kernels: Near-Optimal LU Factorization

Dense linear algebra kernels, such as linear solvers or tensor contracti...
research
05/08/2023

Parallel Cholesky Factorization for Banded Matrices using OpenMP Tasks

Cholesky factorization is a widely used method for solving linear system...

Please sign up or login with your details

Forgot password? Click here to reset