Hypothetical Reasoning via Provenance Abstraction

07/10/2020
by   Daniel Deutch, et al.
0

Data analytics often involves hypothetical reasoning: repeatedly modifying the data and observing the induced effect on the computation result of a data-centric application. Previous work has shown that fine-grained data provenance can help make such an analysis more efficient: instead of a costly re-execution of the underlying application, hypothetical scenarios are applied to a pre-computed provenance expression. However, storing provenance for complex queries and large-scale data leads to a significant overhead, which is often a barrier to the incorporation of provenance-based solutions. To this end, we present a framework that allows to reduce provenance size. Our approach is based on reducing the provenance granularity using user defined abstraction trees over the provenance variables; the granularity is based on the anticipated hypothetical scenarios. We formalize the tradeoff between provenance size and supported granularity of the hypothetical reasoning, and study the complexity of the resulting optimization problem, provide efficient algorithms for tractable cases and heuristics for others. We experimentally study the performance of our solution for various queries and abstraction trees. Our study shows that the algorithms generally lead to substantial speedup of hypothetical reasoning, with a reasonable loss of accuracy.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/10/2020

COBRA: Compression via Abstraction of Provenance for Hypothetical Reasoning

Data analytics often involves hypothetical reasoning: repeatedly modifyi...
research
02/27/2021

On Optimizing the Trade-off between Privacy and Utility in Data Provenance

Organizations that collect and analyze data may wish or be mandated by r...
research
10/19/2022

MuGER^2: Multi-Granularity Evidence Retrieval and Reasoning for Hybrid Question Answering

Hybrid question answering (HQA) aims to answer questions over heterogene...
research
12/21/2019

Measuring Dataset Granularity

Despite the increasing visibility of fine-grained recognition in our fie...
research
04/14/2021

Virtines: Virtualization at Function Call Granularity

Virtual execution environments provide strong isolation, on-demand infra...
research
04/01/2020

Impact of Semantic Granularity on Geographic Information Search Support

The Information Retrieval research has used semantics to provide accurat...
research
09/02/2019

HiCoRe: Visual Hierarchical Context-Reasoning

Reasoning about images/objects and their hierarchical interactions is a ...

Please sign up or login with your details

Forgot password? Click here to reset