Task-parallel Analysis of Molecular Dynamics Trajectories

01/23/2018
by   Ioannis Paraskevakos, et al.
0

Different frameworks for implementing parallel data analytics applications have been proposed by the HPC and Big Data communities. In this paper, we investigate three frameworks: Spark, Dask and RADICAL-Pilot with respect to their ability to support data analytics requirements on HPC resources. We investigate the data analysis requirements of Molecular Dynamics (MD) simulations which are significant consumers of supercomputing cycles, producing immense amounts of data: a typical large-scale MD simulation of physical systems of O(100,000) atoms can produce from O(10) GB to O(1000) GBs of data. We propose and evaluate different approaches for parallelization of a representative set of MD trajectory analysis algorithms, in particular the computation of path similarity and the identification of connected atom. We evaluate Spark, Dask and with respect to the provided abstractions and runtime engine capabilities to support these algorithms. We provide a conceptual basis for comparing and understanding the different frameworks that enable users to select the optimal system for its application. Further, we provide a quantitative performance analysis of the different algorithms across the three frameworks using different high-performance computing resources.

READ FULL TEXT
research
11/06/2018

Defining Big Data Analytics Benchmarks for Next Generation Supercomputers

The design and construction of high performance computing (HPC) systems ...
research
03/06/2021

EVEREST: A design environment for extreme-scale big data analytics on heterogeneous platforms

High-Performance Big Data Analytics (HPDA) applications are characterize...
research
05/29/2019

Evaluation of pilot jobs for Apache Spark applications on HPC clusters

Big Data has become prominent throughout many scientific fields and, as ...
research
12/30/2021

SIM-SITU: A Framework for the Faithful Simulation of in-situ Workflows

The amount of data generated by numerical simulations in various scienti...
research
07/09/2020

A Programming Model for Hybrid Workflows: combining Task-based Workflows and Dataflows all-in-one

This paper tries to reduce the effort of learning, deploying, and integr...
research
06/28/2019

Parallel Performance of Molecular Dynamics Trajectory Analysis

The performance of biomolecular molecular dynamics (MD) simulations has ...
research
08/29/2022

Breaking Down the Parallel Performance of GROMACS, a High-Performance Molecular Dynamics Software

GROMACS is one of the most widely used HPC software packages using the M...

Please sign up or login with your details

Forgot password? Click here to reset