Probabilistic Query Evaluation with Bag Semantics

01/27/2022
by   Martin Grohe, et al.
0

We initiate the study of probabilistic query evaluation under bag semantics where tuples are allowed to be present with duplicates. We focus on self-join free conjunctive queries, and probabilistic databases where occurrences of different facts are independent, which is the natural generalization of tuple-independent probabilistic databases to the bag semantics setting. For set semantics, the data complexity of this problem is well understood, even for the more general class of unions of conjunctive queries: it is either in polynomial time, or #P-hard, depending on the query (Dalvi Suciu, JACM 2012). Due to potentially unbounded multiplicities, the bag probabilistic databases we discuss are no longer finite objects, which requires a treatment of representation mechanisms. Moreover, the answer to a Boolean query is a probability distribution over non-negative integers, rather than a probability distribution over true, false. Therefore, we discuss two flavors of probabilistic query evaluation: computing expectations of answer tuple multiplicities, and computing the probability that a tuple is contained in the answer at most k times for some parameter k. Subject to mild technical assumptions on the representation systems, it turns out that expectations are easy to compute, even for unions of conjunctive queries. For query answer probabilities, we obtain a dichotomy between solvability in polynomial time and #P-hardness for self-join free conjunctive queries.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/28/2021

Monads for Measurable Queries in Probabilistic Databases

We consider a bag (multiset) monad on the category of standard Borel spa...
research
04/06/2022

Computing expected multiplicities for bag-TIDBs with bounded multiplicities

In this work, we study the problem of computing a tuple's expected multi...
research
09/22/2022

Uniform Reliability for Unbounded Homomorphism-Closed Graph Queries

We study the uniform query reliability problem, which asks, for a fixed ...
research
12/02/2014

Approximate Lifted Inference with Probabilistic Databases

This paper proposes a new approach for approximate evaluation of #P-hard...
research
12/16/2021

Computing the Shapley Value of Facts in Query Answering

The Shapley value is a game-theoretic notion for wealth distribution tha...
research
12/26/2019

Solving a Special Case of the Intensional vs Extensional Conjecture in Probabilistic Databases

We consider the problem of exact probabilistic inference for Union of Co...
research
10/30/2020

Independence in Infinite Probabilistic Databases

Probabilistic databases (PDBs) model uncertainty in data. The current st...

Please sign up or login with your details

Forgot password? Click here to reset