An Intermediate Data-driven Methodology for Scientific Workflow Management System to Support Reusability

by   Debasish Chakroborti, et al.

In this thesis first we propose an intermediate data management scheme for a SWfMS. In our second attempt, we explored the possibilities and introduced an automatic recommendation technique for a SWfMS from real-world workflow data (i.e Galaxy [1] workflows) where our investigations show that the proposed technique can facilitate 51 intermediate data of previous workflows and can reduce 74 workflow buildings in a SWfMS. Later we propose an adaptive version of our technique by considering the states of tools in a SWfMS, which shows around 40 reusability for workflows. Consequently, in our fourth study, We have done several experiments for analyzing the performance and exploring the effectiveness of the technique in a SWfMS for various environments. The technique is introduced to emphasize on storing cost reduction, increase data reusability, and faster workflow execution, to the best of our knowledge, which is the first of its kind. Detail architecture and evaluation of the technique are presented in this thesis. We believe our findings and developed system will contribute significantly to the research domain of SWfMSs.



There are no comments yet.


page 33

page 39


Designing for Recommending Intermediate States in A Scientific Workflow Management System

To process a large amount of data sequentially and systematically, prope...

A Bespoke Workflow Management System for Data-Driven Urgent HPC

In this paper we present a workflow management system which permits the ...

Gain-loss ratio of storing intermediate data from workflows

Sequentially, the systematic processing of a significant amount of data ...

A data-driven workflow for predicting horizontal well production using vertical well logs

In recent work, data-driven sweet spotting technique for shale plays pre...

WfCommons: A Framework for Enabling Scientific Workflow Research and Development

Scientific workflows are a cornerstone of modern scientific computing. T...

Data-Aware Approximate Workflow Scheduling

Optimization of data placement in complex scientific workflows has becom...

Exploring Trade-offs in Dynamic Task Triggering for Loosely Coupled Scientific Workflows

In order to achieve near-time insights, scientific workflows tend to be ...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.