Independence in Infinite Probabilistic Databases

10/30/2020
by   Martin Grohe, et al.
0

Probabilistic databases (PDBs) model uncertainty in data. The current standard is to view PDBs as finite probability spaces over relational database instances. Since many attributes in typical databases have infinite domains, such as integers, strings, or real numbers, it is often more natural to view PDBs as infinite probability spaces over database instances. In this paper, we lay the mathematical foundations of infinite probabilistic databases. Our focus then is on independence assumptions. Tuple-independent PDBs play a central role in theory and practice of PDBs. Here, we study infinite tuple-independent PDBs as well as related models such as infinite block-independent disjoint PDBs. While the standard model of PDBs focuses on a set-based semantics, we also study tuple-independent PDBs with a bag semantics and propose Poisson-PDBs as a suitable model. It turns out that for uncountable PDBs, Poisson-PDBs form a natural model of tuple-independence even for a set semantics, and they nicely lock-in with the mathematical theory of Poisson processes. We also propose a new approach to PDBs with an open-world assumption, addressing issues raised by Ceylan et al. (Proc. KR 2016) and generalizing their work, which is still rooted in finite tuple-independent PDBs. Moreover, for countable PDBs we propose an approximate query answering algorithm.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset