Utilizing Provenance in Reusable Research Objects

06/17/2018
by   Zhihao Yuan, et al.
0

Science is conducted collaboratively, often requiring the sharing of knowledge about computational experiments. When experiments include only datasets, they can be shared using Uniform Resource Identifiers (URIs) or Digital Object Identifiers (DOIs). An experiment, however, seldom includes only datasets, but more often includes software, its past execution, provenance, and associated documentation. The Research Object has recently emerged as a comprehensive and systematic method for aggregation and identification of diverse elements of computational experiments. While a necessary method, mere aggregation is not sufficient for the sharing of computational experiments. Other users must be able to easily recompute on these shared research objects. Computational provenance is often the key to enable such reuse. In this paper, we show how reusable research objects can utilize provenance to correctly repeat a previous reference execution, to construct a subset of a research object for partial reuse, and to reuse existing contents of a research object for modified reuse. We describe two methods to summarize provenance that aid in understanding the contents and past executions of a research object. The first method obtains a process-view by collapsing low-level system information, and the second method obtains a summary graph by grouping related nodes and edges with the goal to obtain a graph view similar to application workflow. Through detailed experiments, we show the efficacy and efficiency of our algorithms.

READ FULL TEXT

page 13

page 17

page 18

page 20

research
07/18/2017

Sciunits: Reusable Research Objects

Science is conducted collaboratively, often requiring knowledge sharing ...
research
09/27/2018

Enabling FAIR Research in Earth Science through Research Objects

Data-intensive science communities are progressively adopting FAIR pract...
research
01/17/2022

Sharing Begins at Home

The broad sharing of research data is widely viewed as of critical impor...
research
02/17/2022

The Development and Prospect of Code Clone

The application of code clone technology accelerates code search, improv...
research
01/05/2012

Information Distance: New Developments

In pattern recognition, learning, and data mining one obtains informatio...
research
10/16/2017

Collaboration Spheres: a Visual Metaphor to Share and Reuse Research Objects

Research Objects (ROs) are semantically enhanced aggregations of resourc...
research
05/11/2020

Structuring spreadsheets with ObjTables enables data quality control, reuse, and integration

A central challenge in science is to understand how systems behaviors em...

Please sign up or login with your details

Forgot password? Click here to reset