From MPI to MPI+OpenACC: Conversion of a legacy FORTRAN PCG solver for the spherical Laplace equation
A real-world example of adding OpenACC to a legacy MPI FORTRAN Preconditioned Conjugate Gradient code is described, and timing results for multi-node multi-GPU runs are shown. The code is used to obtain three-dimensional spherical solutions to the Laplace equation. Its application is finding potential field solutions of the solar corona, a useful tool in space weather modeling. We highlight key tips, strategies, and challenges faced when adding OpenACC, including linking FORTRAN code to the cuSparse library, using CUDA-aware MPI, maintaining portability, and dealing with multi-node, multi-GPU run-time environments. Timing results are shown for the code running with MPI-only (up to 1728 CPU cores) and with MPI+OpenACC (up to 64 NVIDIA P100 GPUs). Performance portability is also addressed, including results using MPI+OpenACC for multi-core x86 CPUs.
READ FULL TEXT