Scalable Parallel Linear Solver for Compact Banded Systems on Heterogeneous Architectures

12/29/2020
by   Hang Song, et al.
0

A scalable algorithm for solving compact banded linear systems on distributed memory architectures is presented. The proposed method factorizes the original system into two levels of memory hierarchies, and solves it using parallel cyclic reduction on both distributed and shared memory. This method has a lower communication footprint across distributed memory partitions compared to conventional algorithms involving data transpose or re-partitioning. The algorithm developed in this work is generalized to cyclic compact banded systems with flexible data decompositions. For cyclic compact banded systems, the method is a direct solver with a deterministic operation and communication counts depending on the matrix size, its bandwidth, and the partition strategy. The implementation and runtime configuration details are discussed for performance optimization. Scalability is demonstrated on the linear solver as well as on a representative fluid mechanics application problem, in which the dominant computational cost is solving the cyclic tridiagonal linear systems of compact numerical schemes on a 3D periodic domain. The algorithm is particularly useful for solving the linear systems arising from the application of compact finite difference operators to a wide range of partial differential equation problems, such as but not limited to the numerical simulations of compressible turbulent flows, aeroacoustics, elastic-plastic wave propagation, and electromagnetics. It alleviates obstacles to their use on modern high performance computing hardware, where memory and computational power are distributed across nodes with multi-threaded processing units.

READ FULL TEXT
research
06/30/2020

Hierarchical Jacobi Iteration for Structured Matrices on GPUs using Shared Memory

High fidelity scientific simulations modeling physical phenomena typical...
research
06/11/2021

COSTA: Communication-Optimal Shuffle and Transpose Algorithm with Process Relabeling

Communication-avoiding algorithms for Linear Algebra have become increas...
research
12/20/2017

A distributed-memory hierarchical solver for general sparse linear systems

We present a parallel hierarchical solver for general sparse linear syst...
research
05/23/2017

Parallel Matrix-Free Implementation of Frequency-Domain Finite Difference Methods for Cluster Computing

Full-wave 3D electromagnetic simulations of complex planar devices, mult...
research
05/01/2017

Computing Tropical Prevarieties in Parallel

The computation of the tropical prevariety is the first step in the appl...
research
08/13/2016

Performance prediction of finite-difference solvers for different computer architectures

The life-cycle of a partial differential equation (PDE) solver is often ...
research
02/28/2023

Spectrally-tuned compact finite-difference schemes with domain decomposition and applications to numerical relativity

Compact finite-difference (FD) schemes specify derivative approximations...

Please sign up or login with your details

Forgot password? Click here to reset