Probabilistic Databases with an Infinite Open-World Assumption

07/02/2018
by   Martin Grohe, et al.
0

Probabilistic databases (PDBs) introduce uncertainty into relational databases by specifying probabilities for several possible instances. That is, they are traditionally finite probability spaces over database instances. Such PDBs inherently make a closed-world assumption - non-occurring facts are assumed to be impossible, rather than just unlikely. As convincingly argued by Ceylan et al. (KR 2016), this results in implausibilities and clashes with intuition. An open-world assumption, where facts not explicitly listed may have a small positive probability can yield more reasonable results. The corresponding open-world model of Ceylan et al. however assumes that all entities in the PDB come from a fixed finite universe. In this work, we take one further step and propose a model of "truly" open-world PDBs with an infinite universe. This is natural when we for example consider entities to be integers, real numbers or strings. While the probability space might become infinitely large, all instances of a PDB remain finite. We provide a sound mathematical framework for infinite PDBs in generalization of the existing theory on finite PDBs. Our main results are concerned with tuple-independent PDBs; we present a generic construction showing that such PDBs exist in the infinite and provide a characterization of their existence in general. This model can be used to apply open-world semantics to finite PDBs. The construction can also be extended to so-called block-independent-disjoint probabilistic databases. Algorithmic questions are not the focus of this paper, but we show how query evaluation algorithms can be lifted from finite PDBs to perform approximate evaluation in infinite PDBs.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/30/2020

Independence in Infinite Probabilistic Databases

Probabilistic databases (PDBs) model uncertainty in data. The current st...
research
04/14/2019

Infinite Probabilistic Databases

Probabilistic databases (PDBs) are used to model uncertainty in data in ...
research
08/21/2020

Tuple-Independent Representations of Infinite Probabilistic Databases

Probabilistic databases (PDBs) are probability spaces over database inst...
research
02/27/2019

On Constrained Open-World Probabilistic Databases

Increasing amounts of available data have led to a heightened need for r...
research
06/14/2017

Is Natural Language a Perigraphic Process? The Theorem about Facts and Words Revisited

As we discuss, a stationary stochastic process is nonergodic when a rand...
research
03/05/2020

Finite Open-World Query Answering with Number Restrictions

Open-world query answering is the problem of deciding, given a set of fa...
research
01/22/2023

The umbilical cord of finite model theory

Model theory was born and developed as a part of mathematical logic. It ...

Please sign up or login with your details

Forgot password? Click here to reset