Indexing Execution Patterns in Workflow Provenance Graphs through Generalized Trie Structures

07/19/2018
by   Esteban García-Cuesta, et al.
Expert System S.p.A.
Universidad Europea
0

Over the last years, scientific workflows have become mature enough to be used in a production style. However, despite the increasing maturity, there is still a shortage of tools for searching, adapting, and reusing workflows that hinders a more generalized adoption by the scientific communities. Indeed, due to the limited availability of machine-readable scientific metadata and the heterogeneity of workflow specification formats and representations, new ways to leverage alternative sources of information that complement existing approaches are needed. In this paper we address such limitations by applying statistically enriched generalized trie structures to exploit workflow execution provenance information in order to assist the analysis, indexing and search of scientific workflows. Our method bridges the gap between the description of what a workflow is supposed to do according to its specification and related metadata and what it actually does as recorded in its provenance execution trace. In doing so, we also prove that the proposed method outperforms SPARQL 1.1 Property Paths for querying provenance graphs.

READ FULL TEXT

page 10

page 12

01/19/2023

LaTeX, metadata, and publishing workflows

The field of scientific publishing that is served by LaTeX is increasing...
05/29/2021

WfCommons: A Framework for Enabling Scientific Workflow Research and Development

Scientific workflows are a cornerstone of modern scientific computing. T...
11/23/2022

Towards Advanced Monitoring for Scientific Workflows

Scientific workflows consist of thousands of highly parallelized tasks e...
07/25/2018

Validation and Inference of Schema-Level Workflow Data-Dependency Annotations

An advantage of scientific workflow systems is their ability to collect ...
05/19/2022

Extract Dynamic Information To Improve Time Series Modeling: a Case Study with Scientific Workflow

In modeling time series data, we often need to augment the existing data...
03/19/2019

Aligning Biomedical Metadata with Ontologies Using Clustering and Embeddings

The metadata about scientific experiments published in online repositori...

Please sign up or login with your details

Forgot password? Click here to reset