Transitioning from file-based HPC workflows to streaming data pipelines with openPMD and ADIOS2

07/13/2021
by   Franz Poeschel, et al.
0

This paper aims to create a transition path from file-based IO to streaming-based workflows for scientific applications in an HPC environment. By using the openPMP-api, traditional workflows limited by filesystem bottlenecks can be overcome and flexibly extended for in situ analysis. The openPMD-api is a library for the description of scientific data according to the Open Standard for Particle-Mesh Data (openPMD). Its approach towards recent challenges posed by hardware heterogeneity lies in the decoupling of data description in domain sciences, such as plasma physics simulations, from concrete implementations in hardware and IO. The streaming backend is provided by the ADIOS2 framework, developed at Oak Ridge National Laboratory. This paper surveys two openPMD-based loosely coupled setups to demonstrate flexible applicability and to evaluate performance. In loose coupling, as opposed to tight coupling, two (or more) applications are executed separately, e.g. in individual MPI contexts, yet cooperate by exchanging data. This way, a streaming-based workflow allows for standalone codes instead of tightly-coupled plugins, using a unified streaming-aware API and leveraging high-speed communication infrastructure available in modern compute clusters for massive data exchange. We determine new challenges in resource allocation and in the need of strategies for a flexible data distribution, demonstrating their influence on efficiency and scaling on the Summit compute system. The presented setups show the potential for a more flexible use of compute resources brought by streaming IO as well as the ability to increase throughput by avoiding filesystem bottlenecks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/12/2023

Design and Development of a Java Parallel I/O Library

Parallel I/O refers to the ability of scientific programs to concurrentl...
research
12/05/2019

Merlin: Enabling Machine Learning-Ready HPC Ensembles

With the growing complexity of computational and experimental facilities...
research
09/13/2019

Performance Characterization and Modeling of Serverless and HPC Streaming Applications

Experiment-in-the-Loop Computing (EILC) requires support for numerous ty...
research
11/29/2019

FirecREST: RESTful API on Cray XC systems

As science gateways are becoming an increasingly popular digital interfa...
research
06/28/2019

Parallel Performance of Molecular Dynamics Trajectory Analysis

The performance of biomolecular molecular dynamics (MD) simulations has ...
research
05/26/2021

Cost models for geo-distributed massively parallel streaming analytics

This report is part of the DataflowOpt project on optimization of modern...
research
11/03/2018

Optimizations of the Eigensolvers in the ELPA Library

The solution of (generalized) eigenvalue problems for symmetric or Hermi...

Please sign up or login with your details

Forgot password? Click here to reset