Optimizing the Performance of Reactive Molecular Dynamics Simulations for Multi-Core Architectures
Reactive molecular dynamics simulations are computationally demanding. Reaching spatial and temporal scales where interesting scientific phenomena can be observed requires efficient and scalable implementations on modern hardware. In this paper, we focus on optimizing the performance of the widely used LAMMPS/ReaxC package for multi-core architectures. As hybrid parallelism allows better leverage of the increasing on-node parallelism, we adopt thread parallelism in the construction of bonded and nonbonded lists, and in the computation of complex ReaxFF interactions. To mitigate the I/O overheads due to large volumes of trajectory data produced and to save users the burden of post-processing, we also develop a novel in-situ tool for molecular species analysis. We analyze the performance of the resulting ReaxC-OMP package on Mira, an IBM Blue Gene/Q supercomputer. For PETN systems of sizes ranging from 32 thousand to 16.6 million particles, we observe speedups in the range of 1.5-4.5x. We observe sustained performance improvements for up to 262,144 cores (1,048,576 processes) of Mira and a weak scaling efficiency of 91.5 simulations containing 16.6 million particles. The in-situ molecular species analysis tool incurs only insignificant overheads across various system sizes and run configurations.
READ FULL TEXT