Building Containerized Environments for Reproducibility and Traceability of Scientific Workflows

09/17/2020
by   Paula Olaya, et al.
0

Scientists rely on simulations to study natural phenomena. Trusting the simulation results is vital to develop sciences in any field. One approach to build trust is to ensure the reproducibility and traceability of the simulations through the annotation of executions at the system-level; by the generation of record trails of data moving through the simulation workflow. In this work, we present a system-level solution that leverages the intrinsic characteristics of containers (i.e., portability, isolation, encapsulation, and unique identifiers). Our solution consists of a containerized environment capable to annotate workflows, capture provenance metadata, and build record trails. We assess our environment on four different workflows and measure containerization costs in terms of time and space. Our solution, built with a tolerable time and space overhead, enables transparent and automatic provenance metadata collection and access, an easy-to-read record trail, and tight connections between data and metadata.

READ FULL TEXT
research
05/16/2019

The CEDAR Workbench: An Ontology-Assisted Environment for Authoring Metadata that Describe Scientific Experiments

The Center for Expanded Data Annotation and Retrieval (CEDAR) aims to re...
research
04/20/2022

MEDFORD: A human and machine readable metadata markup language

Reproducibility of research is essential for science. However, in the wa...
research
05/20/2019

Custom Execution Environments with Containers in Pegasus-enabled Scientific Workflows

Science reproducibility is a cornerstone feature in scientific workflows...
research
05/04/2020

EngMeta – Metadata for Computational Engineering

Computational engineering generates knowledge through the analysis and i...
research
04/12/2023

A Decision Tree to Shepherd Scientists through Data Retrievability

Reproducibility is a crucial aspect of scientific research that involves...
research
10/08/2019

Simulation Reproducibility of a Chaotic Circuit

An evergreen scientific feature is the ability for scientific works to b...
research
02/22/2022

Enabling Reproducibility and Meta-learning Through a Lifelong Database of Experiments (LDE)

Artificial Intelligence (AI) development is inherently iterative and exp...

Please sign up or login with your details

Forgot password? Click here to reset