Integrating State of the Art Compute, Communication, and Autotuning Strategies to Multiply the Performance of the Application Programm CPMD for Ab Initio Molecular Dynamics Sim

03/18/2020
by   Tobias Klöffel, et al.
0

We present our recent code modernizations of the of the ab initio molecular dynamics program CPMD (www.cpmd.org) with a special focus on the ultra-soft pseudopotential (USPP) code path. Following the internal instrumentation of CPMD, all time critical routines have been revised to maximize the computational throughput and to minimize the communication overhead for optimal performance. Throughout the program missing hybrid MPI+OpenMP parallelization has been added to optimize scaling. For communication intensive routines, as the multiple distributed 3d FFTs of the electronic states and distributed matrix-matrix multiplications related to the β-projectors of the pseudopotentials, this MPI+OpenMP parallelization now overlaps computation and communication. The necessary partitioning of the workload is optimized by an auto-tuning algorithm. In addition, the largest global MPI_Allreduce operation has been replaced by highly tuned node-local parallelized operations using MPI shared-memory windows to avoid inter-node communication. A batched algorithm for the multiple 3d FFTs improves the throughput of the MPI_Alltoall communication and, thus, the scalability of the implementation, both for USPP and for the frequently used norm-conserving pseudopotential code path. The enhanced performance and scalability is demonstrated on a mid-sized benchmark system of 256 water molecules and further water systems of from 32 up to 2048 molecules.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/10/2018

ECHO-3DHPC: Advance the performance of astrophysics simulations with code modernization

We present recent developments in the parallelization scheme of ECHO-3DH...
research
07/14/2020

MPI Collectives for Multi-core Clusters: Optimized Performance of the Hybrid MPI+MPI Parallel Codes

The advent of multi-/many-core processors in clusters advocates hybrid p...
research
10/09/2018

MPI Windows on Storage for HPC Applications

Upcoming HPC clusters will feature hybrid memories and storage devices p...
research
01/02/2018

Distributed Memory Techniques for Classical Simulation of Quantum Circuits

In this paper we describe, implement, and test the performance of distri...
research
08/27/2021

Optimizing the hybrid parallelization of BHAC

We present our experience with the modernization on the GR-MHD code BHAC...
research
08/31/2023

Implementing scalable matrix-vector products for the exact diagonalization methods in quantum many-body physics

Exact diagonalization is a well-established method for simulating small ...
research
12/22/2022

The Gaia AVU-GSR parallel solver: preliminary studies of a LSQR-based application in perspective of exascale systems

The Gaia Astrometric Verification Unit-Global Sphere Reconstruction (AVU...

Please sign up or login with your details

Forgot password? Click here to reset