ProvLet: A Provenance Management Service for Long Tail Microscopy Data
Provenance management must be present to enhance the overall security and reliability of long-tail microscopy (LTM) data management systems. However, there are challenges in provenance for domains with LTM data. The provenance data need to be collected more frequently, which increases system overheads (in terms of computation and storage) and results in scalability issues. Moreover, in most scientific application domains a provenance solution must consider network-related events as well. Therefore, provenance data in LTM data management systems are highly diverse and must be organized and processed carefully. In this paper, we introduce a novel provenance service, called ProvLet, to collect, distribute, analyze, and visualize provenance data in LTM data management systems. This means (1) we address how to filter and store the desired transactions on disk; (2) we consider a data organization model at higher level data abstractions, suitable for step-by-step scientific experiments, such as datasets and collections, and develop provenance algorithms over these data abstractions, rather than solutions considering low-level abstractions such as files and folders. (3) We utilize ProvLet's log files and visualize provenance information for further forensics explorations. The validation of ProvLet with actual long tail microscopy data, collected over a period of six years, shows a provenance service that yields a low system overhead and enables scalability.
READ FULL TEXT