ProvLet: A Provenance Management Service for Long Tail Microscopy Data

09/22/2021
by   Hessam Moeini, et al.
0

Provenance management must be present to enhance the overall security and reliability of long-tail microscopy (LTM) data management systems. However, there are challenges in provenance for domains with LTM data. The provenance data need to be collected more frequently, which increases system overheads (in terms of computation and storage) and results in scalability issues. Moreover, in most scientific application domains a provenance solution must consider network-related events as well. Therefore, provenance data in LTM data management systems are highly diverse and must be organized and processed carefully. In this paper, we introduce a novel provenance service, called ProvLet, to collect, distribute, analyze, and visualize provenance data in LTM data management systems. This means (1) we address how to filter and store the desired transactions on disk; (2) we consider a data organization model at higher level data abstractions, suitable for step-by-step scientific experiments, such as datasets and collections, and develop provenance algorithms over these data abstractions, rather than solutions considering low-level abstractions such as files and folders. (3) We utilize ProvLet's log files and visualize provenance information for further forensics explorations. The validation of ProvLet with actual long tail microscopy data, collected over a period of six years, shows a provenance service that yields a low system overhead and enables scalability.

READ FULL TEXT

page 1

page 2

page 4

research
05/02/2018

BUDAMAF: Data Management in Cloud Federations

Data management has always been a multi-domain problem even in the simpl...
research
04/07/2020

DataFed: Towards Reproducible Research via Federated Data Management

The increasingly collaborative, globalized nature of scientific research...
research
06/14/2013

Rethinking Abstractions for Big Data: Why, Where, How, and What

Big data refers to large and complex data sets that, under existing appr...
research
02/20/2020

Methods and Experiences for Developing Abstractions for Data-intensive, Scientific Applications

Developing software for scientific applications that require the integra...
research
01/25/2020

GeoRocket: A scalable and cloud-based data store for big geospatial files

We present GeoRocket, a software for the management of very large geospa...
research
06/09/2021

Workflows Community Summit: Advancing the State-of-the-art of Scientific Workflows Management Systems Research and Development

Scientific workflows are a cornerstone of modern scientific computing, a...
research
04/12/2018

Implementing Adaptive Ensemble Biomolecular Applications at Scale

Many scientific problems require multiple distinct computational tasks t...

Please sign up or login with your details

Forgot password? Click here to reset