Evaluating Abstract Asynchronous Schwarz solvers on GPUs

03/11/2020
by   Pratik Nayak, et al.
0

With the commencement of the exascale computing era, we realize that the majority of the leadership supercomputers are heterogeneous and massively parallel even on a single node with multiple co-processors such as GPUs and multiple cores on each node. For example, ORNLs Summit accumulates six NVIDIA Tesla V100s and 42 core IBM Power9s on each node. Synchronizing across all these compute resources in a single node or even across multiple nodes is prohibitively expensive. Hence it is necessary to develop and study asynchronous algorithms that circumvent this issue of bulk-synchronous computing for massive parallelism. In this study, we examine the asynchronous version of the abstract Restricted Additive Schwarz method as a solver where we do not explicitly synchronize, but allow for communication of the data between the sub-domains to be completely asynchronous thereby removing the bulk synchronous nature of the algorithm. We accomplish this by using the onesided RMA functions of the MPI standard. We study the benefits of using such an asynchronous solver over its synchronous counterpart on both multi-core architectures and on multiple GPUs. We also study the communication patterns and local solvers and their effect on the global solver. Finally, we show that this concept can render attractive runtime benefits over the synchronous counterparts.

READ FULL TEXT
research
03/11/2020

Evaluating Abstract Asynchronous Schwarz solvers

With the commencement of the exascale computing era, we realize that the...
research
08/24/2018

Asynchronous One-Level and Two-Level Domain Decomposition Solvers

Parallel implementations of linear iterative solvers generally alternate...
research
04/05/2020

On the Convergence Analysis of Asynchronous SGD for Solving Consistent Linear Systems

In the realm of big data and machine learning, data-parallel, distribute...
research
10/24/2019

XPipe: Efficient Pipeline Model Parallelism for Multi-GPU DNN Training

We propose XPipe, an efficient asynchronous pipeline model parallelism a...
research
09/26/2020

A highly scalable approach to solving linear systems using two-stage multisplitting

Iterative methods for solving large sparse systems of linear equations a...
research
11/24/2022

MRHS multigrid solver for Wilson-clover fermions

We describe our implementation of a multigrid solver for Wilson-clover f...
research
05/15/2022

Physics-inspired Ising Computing with Ring Oscillator Activated p-bits

The nearing end of Moore's Law has been driving the development of domai...

Please sign up or login with your details

Forgot password? Click here to reset