Stage-parallel fully implicit Runge-Kutta implementations with optimal multilevel preconditioners at the scaling limit

09/14/2022
by   Peter Munch, et al.
0

We present an implementation of a fully stage-parallel preconditioner for Radau IIA type fully implicit Runge–Kutta methods, which approximates the inverse of A_Q from the Butcher tableau by the lower triangular matrix resulting from an LU decomposition and diagonalizes the system with as many blocks as stages. For the transformed system, we employ a block preconditioner where each block is distributed and solved by a subgroup of processes in parallel. For combination of partial results, we either use a communication pattern resembling Cannon's algorithm or shared memory. A performance model and a large set of performance studies (including strong scaling runs with up to 150k processes on 3k compute nodes) conducted for a time-dependent heat problem, using matrix-free finite element methods, indicate that the stage-parallel implementation can reach higher throughputs when the block solvers operate at lower parallel efficiencies, which occurs near the scaling limit. Achievable speedup increases linearly with number of stages and are bounded by the number of stages. Furthermore, we show that the presented stage-parallel concepts are also applicable to the case that A_Q is directly diagonalized, which requires complex arithmetic or the solution of two-by-two blocks and sequentializes parts of the algorithm. Alternatively to distributing stages and assigning them to distinct processes, we discuss the possibility of batching operations from different stages together.

READ FULL TEXT

page 9

page 15

research
12/23/2020

Optimal and Low-Memory Near-Optimal Preconditioning of Fully Implicit Runge-Kutta Schemes for Parabolic PDEs

Runge-Kutta (RK) schemes, especially Gauss-Legendre and some other fully...
research
09/09/2021

Fast Power Series Solution of Large 3-D Electrodynamic Integral Equation for PEC Scatterers

This paper presents a new fast power series solution method to solve the...
research
09/26/2020

A highly scalable approach to solving linear systems using two-stage multisplitting

Iterative methods for solving large sparse systems of linear equations a...
research
04/28/2023

On the convergence of monolithic multigrid for implicit Runge-Kutta time stepping of finite element problems

Finite element discretization of time dependent problems also require ef...
research
03/23/2022

Efficient distributed matrix-free multigrid methods on locally refined meshes for FEM computations

This work studies three multigrid variants for matrix-free finite-elemen...
research
03/10/2020

Parallel Robust Computation of Generalized Eigenvectors of Matrix Pencils

In this paper we consider the problem of computing generalized eigenvect...
research
08/16/2019

Parallel Computation of Alpha Complex for Biomolecules

Alpha complex, a subset of the Delaunay triangulation, has been extensiv...

Please sign up or login with your details

Forgot password? Click here to reset