MPI+X: task-based parallelization and dynamic load balance of finite element assembly

05/09/2018
by   Marta Garcia-Gasulla, et al.
0

The main computing tasks of a finite element code(FE) for solving partial differential equations (PDE's) are the algebraic system assembly and the iterative solver. This work focuses on the first task, in the context of a hybrid MPI+X paradigm. Although we will describe algorithms in the FE context, a similar strategy can be straightforwardly applied to other discretization methods, like the finite volume method. The matrix assembly consists of a loop over the elements of the MPI partition to compute element matrices and right-hand sides and their assemblies in the local system to each MPI partition. In a MPI+X hybrid parallelism context, X has consisted traditionally of loop parallelism using OpenMP. Several strategies have been proposed in the literature to implement this loop parallelism, like coloring or substructuring techniques to circumvent the race condition that appears when assembling the element system into the local system. The main drawback of the first technique is the decrease of the IPC due to bad spatial locality. The second technique avoids this issue but requires extensive changes in the implementation, which can be cumbersome when several element loops should be treated. We propose an alternative, based on the task parallelism of the element loop using some extensions to the OpenMP programming model. The taskification of the assembly solves both aforementioned problems. In addition, dynamic load balance will be applied using the DLB library, especially efficient in the presence of hybrid meshes, where the relative costs of the different elements is impossible to estimate a priori. This paper presents the proposed methodology, its implementation and its validation through the solution of large computational mechanics problems up to 16k cores.

READ FULL TEXT
research
02/09/2018

GPU Accelerated Finite Element Assembly with Runtime Compilation

In recent years, high performance scientific computing on graphics proce...
research
08/08/2021

Parallel Sub-Structuring Methods for solving Sparse Linear Systems on a cluster of GPU

The main objective of this work consists in analyzing sub-structuring me...
research
03/22/2019

Hierarchical Dynamic Loop Self-Scheduling on Distributed-Memory Systems Using an MPI+MPI Approach

Computationally-intensive loops are the primary source of parallelism in...
research
01/21/2021

Efficient quadrature rules for finite element discretizations of nonlocal equations

In this paper we design efficient quadrature rules for finite element di...
research
06/18/2018

Implementation of Peridynamics utilizing HPX -- the C++ standard library for parallelism and concurrency

Peridynamics is a non-local generalization of continuum mechanics tailor...
research
05/21/2020

A cookbook for finite element methods for nonlocal problems, including quadrature rules and approximate Euclidean balls

The implementation of finite element methods (FEMs) for nonlocal models ...
research
05/10/2023

Improving the performance of classical linear algebra iterative methods via hybrid parallelism

We propose fork-join and task-based hybrid implementations of four class...

Please sign up or login with your details

Forgot password? Click here to reset