Acceleration of a production Solar MHD code with Fortran standard parallelism: From OpenACC to `do concurrent'

03/05/2023
by   Ronald M. Caplan, et al.
0

There is growing interest in using standard language constructs for accelerated computing, avoiding the need for (often vendor-specific) external APIs. These constructs hold the potential to be more portable and much more `future-proof'. For Fortran codes, the current focus is on the do concurrent (DC) loop. While there have been some successful examples of GPU-acceleration using DC for benchmark and/or small codes, its widespread adoption will require demonstrations of its use in full-size applications. Here, we look at the current capabilities and performance of using DC in a production application called Magnetohydrodynamic Algorithm outside a Sphere (MAS). MAS is a state-of-the-art model for studying coronal and heliospheric dynamics, is over 70,000 lines long, and has previously been ported to GPUs using MPI+OpenACC. We attempt to eliminate as many of its OpenACC directives as possible in favor of DC. We show that using the NVIDIA nvfortran compiler's Fortran 202X preview implementation, unified managed memory, and modified MPI launch methods, we can achieve GPU acceleration across multiple GPUs without using a single OpenACC directive. However, doing so results in a slowdown between 1.25x and 3x. We discuss what future improvements are needed to avoid this loss, and show how we can still retain close

READ FULL TEXT

page 6

page 7

page 8

research
10/18/2021

Can Fortran's 'do concurrent' replace directives for accelerated computing?

Recently, there has been growing interest in using standard language con...
research
11/06/2018

GPU Acceleration of an Established Solar MHD Code using OpenACC

GPU accelerators have had a notable impact on high-performance computing...
research
12/28/2020

TEMPI: An Interposed MPI Library with a Canonical Representation of CUDA-aware Datatypes

MPI derived datatypes are an abstraction that simplifies handling of non...
research
04/13/2023

Automated Translation and Accelerated Solving of Differential Equations on Multiple GPU Platforms

We demonstrate a high-performance vendor-agnostic method for massively p...
research
03/03/2020

A GPU-Accelerated Barycentric Lagrange Treecode

We present an MPI + OpenACC implementation of the kernel-independent bar...
research
07/16/2021

Refactoring the MPS/University of Chicago Radiative MHD(MURaM) Model for GPU/CPU Performance Portability Using OpenACC Directives

The MURaM (Max Planck University of Chicago Radiative MHD) code is a sol...
research
06/21/2011

Accelerating Lossless Data Compression with GPUs

Huffman compression is a statistical, lossless, data compression algorit...

Please sign up or login with your details

Forgot password? Click here to reset