OneProvenance: Efficient Extraction of Dynamic Coarse-Grained Provenance from Database Logs [Technical Report]

10/25/2022
by   Fotis Psallidas, et al.
0

Provenance encodes information that connects datasets, their generation workflows, and associated metadata (e.g., who or when executed a query). As such, it is instrumental for a wide range of critical governance applications (e.g., observability and auditing). Unfortunately, in the context of database systems, extracting coarse-grained provenance is a long-standing problem due to the complexity and sheer volume of database workflows. Provenance extraction from query event logs has been recently proposed as favorable because, in principle, can result in meaningful provenance graphs for provenance applications. Current approaches, however, (a) add substantial overhead to the database and provenance extraction workflows and (b) extract provenance that is noisy, omits query execution dependencies, and is not rich enough for upstream applications. To address these problems, we introduce OneProvenance: an efficient provenance extraction system from query event logs. OneProvenance addresses the unique challenges of log-based extraction by (a) identifying query execution dependencies through efficient log analysis, (b) extracting provenance through novel event transformations that account for query dependencies, and (c) introducing effective filtering optimizations. Our thorough experimental analysis shows that OneProvenance can improve extraction by up to  18X compared to state-of-the-art baselines; our optimizations reduce the extraction noise and optimize performance even further. OneProvenance is deployed at scale by Microsoft Purview and actively supports customer provenance extraction needs (https://bit.ly/3N2JVGF).

READ FULL TEXT

page 4

page 12

research
10/11/2017

Mining Frequent Patterns in Process Models

Process mining has emerged as a way to analyze the behavior of an organi...
research
01/09/2022

Differentially Private Release of Event Logs for Process Mining

The applicability of process mining techniques hinges on the availabilit...
research
02/13/2020

Explainable Queries over Event Logs

Added value can be extracted from event logs generated by business proce...
research
04/01/2021

ProcessTransformer: Predictive Business Process Monitoring with Transformer Network

Predictive business process monitoring focuses on predicting future char...
research
01/22/2018

Smoke: Fine-grained Lineage at Interactive Speed

Data lineage describes the relationship between individual input and out...
research
10/17/2022

Effective and Efficient Query-aware Snippet Extraction for Web Search

Query-aware webpage snippet extraction is widely used in search engines ...
research
10/25/2019

Implementing choreography extraction

Choreography extraction deals with the generation of a choreography (a g...

Please sign up or login with your details

Forgot password? Click here to reset