Performance portable ice-sheet modeling with MALI

04/08/2022
by   Jerry Watkins, et al.
0

High resolution simulations of polar ice-sheets play a crucial role in the ongoing effort to develop more accurate and reliable Earth-system models for probabilistic sea-level projections. These simulations often require a massive amount of memory and computation from large supercomputing clusters to provide sufficient accuracy and resolution. The latest exascale machines poised to come online contain a diverse set of computing architectures. In an effort to avoid architecture specific programming and maintain productivity across platforms, the ice-sheet modeling code known as MALI uses high level abstractions to integrate Trilinos libraries and the Kokkos programming model for performance portable code across a variety of different architectures. In this paper, we analyze the performance portable features of MALI via a performance analysis on current CPU-based and GPU-based supercomputers. The analysis highlights performance portable improvements made in finite element assembly and multigrid preconditioning within MALI with speedups between 1.26-1.82x across CPU and GPU architectures but also identifies the need to further improve performance in software coupling and preconditioning on GPUs. We also perform a weak scalability study and show that simulations on GPU-based machines perform 1.24-1.92x faster when utilizing the GPUs. The best performance is found in finite element assembly which achieved a speedup of up to 8.65x and a weak scaling efficiency of 82.9 performance testing framework developed for this code base using a changepoint detection method. The framework is used to make actionable decisions about performance within MALI. We provide several concrete examples of scenarios in which the framework has identified performance regressions, improvements, and algorithm differences over the course of two years of development.

READ FULL TEXT

page 8

page 10

page 14

page 21

page 26

page 27

page 29

research
02/09/2018

GPU Accelerated Finite Element Assembly with Runtime Compilation

In recent years, high performance scientific computing on graphics proce...
research
12/14/2021

Matrix-free approaches for GPU acceleration of a high-order finite element hydrodynamics application using MFEM, Umpire, and RAJA

With the introduction of advanced heterogeneous computing architectures ...
research
10/31/2016

Hybrid CPU-GPU generation of the Hamiltonian and Overlap matrices in FLAPW methods

In this paper we focus on the integration of high-performance numerical ...
research
10/14/2020

Performance Analysis of a Quantum Monte Carlo Application on Multiple Hardware Architectures Using the HPX Runtime

This paper describes how we successfully used the HPX programming model ...
research
08/08/2019

From Piz Daint to the Stars: Simulation of Stellar Mergers using High-Level Abstractions

We study the simulation of stellar mergers, which requires complex simul...
research
07/24/2021

Performance assessment of CUDA and OpenACC in large scale combustion simulations

GPUs have climbed up to the top of supercomputer systems making life har...
research
11/06/2017

Simple and efficient GPU parallelization of existing H-Matrix accelerated BEM code

In this paper, we demonstrate how GPU-accelerated BEM routines can be us...

Please sign up or login with your details

Forgot password? Click here to reset