Parallel Performance of Molecular Dynamics Trajectory Analysis

06/28/2019
by   Mahzad Khoshlessan, et al.
0

The performance of biomolecular molecular dynamics (MD) simulations has steadily increased on modern high performance computing (HPC) resources but acceleration of the analysis of the output trajectories has lagged behind so that analyzing simulations is increasingly becoming a bottleneck. To close this gap, we studied the performance of parallel trajectory analysis with MPI and the Python MDAnalysis library on three different XSEDE supercomputers where trajectories were read from a Lustre parallel file system. We found that strong scaling performance was impeded by stragglers, MPI processes that were slower than the typical process and that therefore dominated the overall run time. Stragglers were less prevalent for compute-bound workloads, thus pointing to file reading as a crucial bottleneck for scaling. However, a more complicated picture emerged in which both the computation and the ingestion of data exhibited close to ideal strong scaling behavior whereas stragglers were primarily caused by either large MPI communication costs or long times to open the single shared trajectory file. We improved overall strong scaling performance by two different approaches to file access, namely subfiling (splitting the trajectory into as many trajectory segments as number of processes) and MPI-IO with Parallel HDF5 trajectory files. Applying these strategies, we obtained near ideal strong scaling on up to 384 cores (16 nodes). We summarize our lessons-learned in guidelines and strategies on how to take advantage of the available HPC resources to gain good scalability and potentially reduce trajectory analysis times by two orders of magnitude compared to the prevalent serial approach.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/20/2021

OMB-Py: Python Micro-Benchmarks for Evaluating Performance of MPI Libraries on HPC Systems

Python has become a dominant programming language for emerging areas lik...
research
08/29/2022

Breaking Down the Parallel Performance of GROMACS, a High-Performance Molecular Dynamics Software

GROMACS is one of the most widely used HPC software packages using the M...
research
05/12/2023

Design and Development of a Java Parallel I/O Library

Parallel I/O refers to the ability of scientific programs to concurrentl...
research
07/29/2019

Improving MPI Collective I/O Performance With Intra-node Request Aggregation

Two-phase I/O is a well-known strategy for implementing collective MPI-I...
research
01/23/2018

Task-parallel Analysis of Molecular Dynamics Trajectories

Different frameworks for implementing parallel data analytics applicatio...
research
03/02/2021

Scalable communication for high-order stencil computations using CUDA-aware MPI

Modern compute nodes in high-performance computing provide a tremendous ...
research
07/13/2021

Transitioning from file-based HPC workflows to streaming data pipelines with openPMD and ADIOS2

This paper aims to create a transition path from file-based IO to stream...

Please sign up or login with your details

Forgot password? Click here to reset