Custom Execution Environments with Containers in Pegasus-enabled Scientific Workflows

05/20/2019
by   Karan Vahi, et al.
0

Science reproducibility is a cornerstone feature in scientific workflows. In most cases, this has been implemented as a way to exactly reproduce the computational steps taken to reach the final results. While these steps are often completely described, including the input parameters, datasets, and codes, the environment in which these steps are executed is only described at a higher level with endpoints and operating system name and versions. Though this may be sufficient for reproducibility in the short term, systems evolve and are replaced over time, breaking the underlying workflow reproducibility. A natural solution to this problem is containers, as they are well defined, have a lifetime independent of the underlying system, and can be user-controlled so that they can provide custom environments if needed. This paper highlights some unique challenges that may arise when using containers in distributed scientific workflows. Further, this paper explores how the Pegasus Workflow Management System implements container support to address such challenges.

READ FULL TEXT
research
08/09/2017

A Collaborative Approach to Computational Reproducibility

Although a standard in natural science, reproducibility has been only ep...
research
12/24/2020

Reproducible Workflow

Reproducibility has been consistently identified as an important compone...
research
11/23/2022

Towards Advanced Monitoring for Scientific Workflows

Scientific workflows consist of thousands of highly parallelized tasks e...
research
09/17/2020

Building Containerized Environments for Reproducibility and Traceability of Scientific Workflows

Scientists rely on simulations to study natural phenomena. Trusting the ...
research
11/10/2022

Evaluation of tools for describing, reproducing and reusing scientific workflows

In the field of computational science and engineering, workflows often e...
research
07/01/2021

Toward Interoperable Cyberinfrastructure: Common Descriptions for Computational Resources and Applications

The user-facing components of the Cyberinfrastructure (CI) ecosystem, sc...
research
03/04/2021

Restoring Execution Environments of Jupyter Notebooks

More than ninety percent of published Jupyter notebooks do not state dep...

Please sign up or login with your details

Forgot password? Click here to reset