PittPack: An Open-Source Poisson's Equation Solver for Extreme-Scale Computing with Accelerators

09/12/2019
by   Jaber J. Hasbestan, et al.
0

We present a parallel implementation of a direct solver for the Poisson's equation on extreme-scale supercomputers with accelerators. We introduce a chunked-pencil decomposition as the domain-decomposition strategy to distribute work among processing elements to achieve superior scalability at large number of accelerators. Chunked-pencil decomposition enables overlapping nodal communication and data transfer between the central processing units (CPUs) and the graphics processing units (GPUs). Second, it improves data locality by keeping neighboring elements in adjacent memory locations. Third, it allows usage of shared-memory for certain segments of the algorithm when possible, and last but not least, it enables contiguous message transfer among the nodes. Two different communication patterns are designed. The fist pattern aims to fully overlap the communication with data transfer and designed for speedup of overall turnaround time, whereas the second method concentrates on low memory usage and is more network friendly for computations at extreme scale. To ensure software portability, we interleave OpenACC with MPI in the software. The numerical solution and its formal second order of accuracy is verified using method of manufactured solutions for various combinations of boundary conditions. Weak scaling analysis is performed using up to 1.1 trillion Cartesian mesh points using 16384 GPUs on a petascale leadership class supercomputer.

READ FULL TEXT

page 20

page 27

research
06/04/2020

Multi-GPU Performance Optimization of a CFD Code using OpenACC on Different Platforms

This paper investigates the multi-GPU performance of a 3D buoyancy drive...
research
03/22/2021

Non-iterative domain decomposition for the Helmholtz equation using the method of difference potentials

We use the Method of Difference Potentials (MDP) to solve a non-overlapp...
research
03/02/2021

Scalable communication for high-order stencil computations using CUDA-aware MPI

Modern compute nodes in high-performance computing provide a tremendous ...
research
06/18/2020

Computing techniques

This lecture aims at providing a user's perspective on the main concepts...
research
03/27/2018

Extreme Scale FMM-Accelerated Boundary Integral Equation Solver for Wave Scattering

Algorithmic and architecture-oriented optimizations are essential for ac...
research
03/01/2019

A massively parallel semi-Lagrangian solver for the six-dimensional Vlasov-Poisson equation

This paper presents an optimized and scalable semi-Lagrangian solver for...
research
09/10/2020

Accelerating High-Order Stencils on GPUs

Stencil computations are widely used in HPC applications. Today, many HP...

Please sign up or login with your details

Forgot password? Click here to reset