Improving strong scaling of the Conjugate Gradient method for solving large linear systems using global reduction pipelining

05/15/2019
by   Siegfried Cools, et al.
0

This paper presents performance results comparing MPI-based implementations of the popular Conjugate Gradient (CG) method and several of its communication hiding (or 'pipelined') variants. Pipelined CG methods are designed to efficiently solve SPD linear systems on massively parallel distributed memory hardware, and typically display significantly improved strong scaling compared to classic CG. This increase in parallel performance is achieved by overlapping the global reduction phase (MPI_Iallreduce) required to compute the inner products in each iteration by (chiefly local) computational work such as the matrix-vector product as well as other global communication. This work includes a brief introduction to the deep pipelined CG method for readers that may be unfamiliar with the specifics of the method. A brief overview of implementation details provides the practical tools required for implementation of the algorithm. Subsequently, easily reproducible strong scaling results on the US Department of Energy (DoE) NERSC machine 'Cori' (Phase I - Haswell nodes) on up to 1024 nodes with 16 MPI ranks per node are presented using an implementation of p(l)-CG that is available in the open source PETSc library. Observations on the staggering and overlap of the asynchronous, non-blocking global communication phases with communication and computational kernels are drawn from the experiments.

READ FULL TEXT

page 1

page 8

page 9

research
01/15/2018

The Communication-Hiding Conjugate Gradient Method with Deep Pipelines

Krylov subspace methods are among the most efficient present-day solvers...
research
03/02/2021

Scalable communication for high-order stencil computations using CUDA-aware MPI

Modern compute nodes in high-performance computing provide a tremendous ...
research
10/11/2017

Subdomain Deflation Combined with Local AMG: a Case Study Using AMGCL Library

The paper proposes a combination of the subdomain deflation method and l...
research
05/04/2019

New communication hiding conjugate gradient variants

The conjugate gradient algorithm suffers from communication bottlenecks ...
research
05/04/2019

Predict-and-recompute conjugate gradient variants

The standard implementation of the conjugate gradient algorithm suffers ...
research
08/24/2021

Communication-hiding pipelined BiCGSafe methods for solving large linear systems

Recently, a new variant of the BiCGStab method, known as the pipeline Bi...
research
03/16/2022

On Distributed Gravitational N-Body Simulations

The N-body problem is a classic problem involving a system of N discrete...

Please sign up or login with your details

Forgot password? Click here to reset