Rapid Exploration of Optimization Strategies on Advanced Architectures using TestSNAP and LAMMPS

11/25/2020
by   Rahulkumar Gayatri, et al.
0

The exascale race is at an end with the announcement of the Aurora and Frontier machines. This next generation of supercomputers utilize diverse hardware architectures to achieve their compute performance, providing an added onus on the performance portability of applications. An expanding fragmentation of programming models would provide a compounding optimization challenge were it not for the evolution of performance-portable frameworks, providing unified models for mapping abstract hierarchies of parallelism to diverse architectures. A solution to this challenge is the evolution of performance-portable frameworks, providing unified models for mapping abstract hierarchies of parallelism to diverse architectures. Kokkos is one such performance portable programming model for C++ applications, providing back-end implementations for each major HPC platform. Even with a performance portable framework, restructuring algorithms to expose higher degrees of parallelism is non-trivial. The Spectral Neighbor Analysis Potential (SNAP) is a machine-learned inter-atomic potential utilized in cutting-edge molecular dynamics simulations. Previous implementations of the SNAP calculation showed a downward trend in their performance relative to peak on newer-generation CPUs and low performance on GPUs. In this paper we describe the restructuring and optimization of SNAP as implemented in the Kokkos CUDA backend of the LAMMPS molecular dynamics package, benchmarked on NVIDIA GPUs. We identify novel patterns of hierarchical parallelism, facilitating a minimization of memory access overheads and pushing the implementation into a compute-saturated regime. Our implementation via Kokkos enables recompile-and-run efficiency on upcoming architectures. We find a ∼22x time-to-solution improvement relative to an existing implementation as measured on an NVIDIA Tesla V100-16GB for an important benchmark.

READ FULL TEXT

page 1

page 5

page 6

research
04/04/2023

Portable Programming Model Exploration for LArTPC Simulation in a Heterogeneous Computing Environment: OpenMP vs. SYCL

The evolution of the computing landscape has resulted in the proliferati...
research
06/23/2017

Optimizing the Performance of Reactive Molecular Dynamics Simulations for Multi-Core Architectures

Reactive molecular dynamics simulations are computationally demanding. R...
research
12/22/2021

Lifting C Semantics for Dataflow Optimization

C is the lingua franca of programming and almost any device can be progr...
research
05/30/2022

Closing the Performance Gap with Modern C++

On the way to Exascale, programmers face the increasing challenge of hav...
research
09/22/2021

Code modernization strategies for short-range non-bonded molecular dynamics simulations

As modern HPC systems increasingly rely on greater core counts and wider...
research
10/19/2020

Evaluating the Cost of Atomic Operations on Modern Architectures

Atomic operations (atomics) such as Compare-and-Swap (CAS) or Fetch-and-...
research
04/18/2019

Memory and Parallelism Analysis Using a Platform-Independent Approach

Emerging computing architectures such as near-memory computing (NMC) pro...

Please sign up or login with your details

Forgot password? Click here to reset