Validation and Inference of Schema-Level Workflow Data-Dependency Annotations

07/25/2018
by   Shawn Bowers, et al.
0

An advantage of scientific workflow systems is their ability to collect runtime provenance information as an execution trace. Traces include the computation steps invoked as part of the workflow run along with the corresponding data consumed and produced by each workflow step. The information captured by a trace is used to infer "lineage" relationships among data items, which can help answer provenance queries to find workflow inputs that were involved in producing specific workflow outputs. Determining lineage relationships, however, requires an understanding of the dependency patterns that exist between each workflow step's inputs and outputs, and this information is often under-specified or generally assumed by workflow systems. For instance, most approaches assume all outputs depend on all inputs, which can lead to lineage "false positives". In prior work, we defined annotations for specifying detailed dependency relationships between inputs and outputs of computation steps. These annotations are used to define corresponding rules for inferring fine-grained data dependencies from a trace. In this paper, we extend our previous work by considering the impact of dependency annotations on workflow specifications. In particular, we provide a reasoning framework to ensure the set of dependency annotations on a workflow specification is consistent. The framework can also infer a complete set of annotations given a partially annotated workflow. Finally, we describe an implementation of the reasoning framework using answer-set programming.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/15/2019

Bounded and Approximate Strong Satisfiability in Workflows

There has been a considerable amount of interest in recent years in the ...
research
09/01/2020

WorkflowHub: Community Framework for Enabling Scientific Workflow Research and Development – Technical Report

Scientific workflows are a cornerstone of modern scientific computing. T...
research
06/18/2019

The Workflow Trace Archive: Open-Access Data from Public and Private Computing Infrastructures -- Technical Report

Realistic, relevant, and reproducible experiments often need input trace...
research
08/25/2018

Efficiently Processing Workflow Provenance Queries on SPARK

In this paper, we investigate how we can leverage Spark platform for eff...
research
09/15/2023

Speeding up charge exchange recombination spectroscopy analysis in support of NERSC/DIII-D realtime workflow

We report optimization work made in support of the development of a real...
research
07/19/2018

Indexing Execution Patterns in Workflow Provenance Graphs through Generalized Trie Structures

Over the last years, scientific workflows have become mature enough to b...
research
09/26/2018

Results in Workflow Resiliency: Complexity, New Formulation, and ASP Encoding

First proposed by Wang and Li in 2007, workflow resiliency is a policy a...

Please sign up or login with your details

Forgot password? Click here to reset