Universal (and Existential) Nulls

03/05/2018
by   Gösta Grahne, et al.
Concordia University
0

Incomplete Information research is quite mature when it comes to so called existential nulls, where an existential null is a value stored in the database, representing an unknown object. For some reason universal nulls, that is, values representing all possible objects, have received almost no attention. We remedy the situation in this paper, by showing that a suitable finite representation mechanism, called Star Cylinders, handling universal nulls can be developed based on the Cylindric Set Algebra of Henkin, Monk and Tarski. We provide a finitary version of the cylindric set algebra, called Cylindric Star Algebra, and show that our star-cylinders are closed under this algebra. Moreover, we show that any First Order Relational Calculus query over databases containing universal nulls can be translated into an equivalent expression in our cylindric star-algebra, and vice versa, in time polynomial in the size of the database. The representation mechanism is then extended to Naive Star Cylinders, which are star-cylinders allowing existential nulls in addition to universal nulls. For positive queries (with universal quantification), the well known naive evaluation technique can still be applied on the existential nulls, thereby allowing polynomial time evaluation of certain answers on databases containing both universal and existential nulls. If precise answers are required, certain answer evaluation with universal and existential nulls remains in coNP. Note that the problem is coNP-hard, already for positive existential queries and databases with only existential nulls. If inequalities (x_i≈ x_j) are allowed, reasoning over existential databases is known to be Π^p_2-complete, and it remains in Π^p_2 when universal nulls and full first order queries are allowed.

READ FULL TEXT VIEW PDF

Authors

page 1

page 2

page 3

page 4

10/16/2020

Enumerating Answers to First-Order Queries over Databases of Low Degree

A class of relational databases has low degree if for all δ>0, all but f...
07/22/2017

Possible and Certain Answers for Queries over Order-Incomplete Data

To combine and query ordered data from multiple sources, one needs to ha...
02/13/2019

Counting Answers to Existential Questions

Conjunctive queries select and are expected to return certain tuples fro...
07/09/2020

Universal Algebra in UniMath

We present an ongoing effort to implement Universal Algebra in the UniMa...
12/16/2019

Polynomial Rewritings from Expressive Description Logics with Closed Predicates to Variants of Datalog

In many scenarios, complete and incomplete information coexist. For this...
03/30/2020

Consistency and Certain Answers in Relational to RDF Data Exchange with Shape Constraints

We investigate the data exchange from relational databases to RDF graphs...
02/23/2021

Efficient Uncertainty Tracking for Complex Queries with Attribute-level Bounds (extended version)

Certain answers are a principled method for coping with the uncertainty ...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

In this paper we revisit the foundations of the relational model and unearth universal nulls, showing that they can be treated on par with the usual existential nulls [19, 12, 13]. Recall that an existential null in a tuple in a relation represents an existentially quantified variable in an atomic sentence . This corresponds to the intuition ”value exists, but is unknown.” A universal null, on the other hand, does not represent anything unknown, but stands for all values of the domain. In other words, a universal null represents a universally quantified variable. Universal nulls have an obvious application in databases, as the following example shows. The symbol ”” denotes a universal null.

Example 1

Consider binary relations (ollows) and (obbies), where means that user follows user on a social media site, and means that is a hobby of user . Let the database be the following.

Alice Chris Alice Bob Chris Bob David Bob Alice Movies Alice Music Bob Basketball

This is to be interpreted as expressing the facts that Alice follows Chris and Chris and David follow Bob. Alice is a journalist who would like to give access to everyone to articles she shares on the social media site. Therefore, everyone can follow Alice. Bob is the site administrator, and is granted the access to all files anyone shares on the site. Consequently, Bob follows everyone. ”Everyone” in this context means all current and possible future users. The query below, in domain relational calculus, asks for the interests of people who are followed by everyone:

(1)

The answer to our example query is . Note that star-nulls also can be part of an answer. For instance, the query would return all the tuples in .

Another area of applications of “*”-nulls relates to intuitionistic, or constructive database logic. In the constructive four-valued approach of [15] and the three-valued approach of [13, 22] the proposition is not a tautology. In order for to be true, we need either a constructive proof of or a constructive proof of . Therefore both [15] and [22] assume that the database has a theory of the negative information, i.e. that , where contains the positive information and the negative information. The papers [15] and [22] then show how to transform an FO-query to a pair of queries such that returns the tuples for which is true in , and returns the tuples for which is true in  (i.e.  is false in ). It turns out that databases containing “*”-nulls are suitable for storing .

Example 2

Suppose that the instance in Example 1 represents , and that all negative information we have deduced about the relation, is that we know Alice doesn’t play Volleyball, that Bob only has Basketball as hobby, and that Chris has no hobby at all. This negative information about the relation is represented by the table below. Note that is part of .

Alice Volleyball
Bob (except Basketball)
Chris

Suppose the query asks for people who have a hobby, that is . Then , and . Evaluating on returns , and evaluating on returns . Note that there is no closed-world assumption as the negative facts are explicit. Thus it is unknown whether David has a hobby or not.

Universal nulls were first studied in the early days of database theory by Biskup in [6]. This was a follow-up on his earlier paper on existential nulls [5]. The problem with Biskup’s approach, as noted by himself, was that the semantics for his algebra worked only for individual operators, not for compound expressions (i.e. queries). This was remedied in the foundational paper [19] by Imielinski and Lipski, as far as existential nulls were concerned. Universal nulls next came up in [20], where Imielinski and Lipski showed that Codd’s Relational Algebra could be embedded in CA, the Cylindric Set Algebra of Henkin, Monk, and Tarski [16, 17]. As a side remark, Imielinski and Lipski suggested that the semantics of their ”” symbol could be seen as modeling the universal null of Biskup. In this paper we follow their suggestion111We note that Sundarmurthy et. al. [25] very recently have proposed a construct related to our universal nulls, and studied ways on placing constraints on them., and fully develop a finitary representation mechanism for databases with universal nulls, as well as an accompanying finitary algebra. We show that any FO (First Order / Domain Relational Calculus) query can be translated into an equivalent expression in a finitary version of CA, and that such algebraic expressions can be evaluated ”naively” by the rules “” and “” for any constant “.” Our finitary version is called Cylindric Star Algebra (SCA) and operates on finite relations containing constants and universal nulls “.” These relations are called Star Cylinders and they are finite representations of a subclass of the infinite cylinders of Henkin, Monk, and Tarski. Interestingly, the class of star-cylinders is closed under first order querying, meaning that the infinite result of an FO query on an infinite instance represented by a finite sequence of finite star-cylinders can be represented by a finite star-cylinder.222Consequently there is no need to require calculus queries to be “domain independent.” This is achieved by showing that the class of star-cylinders are closed under our cylindric star-algebra, and that SCA as a query language is equivalent in expressive power with FO.

The Cylindric Set Algebra [16, 17] —as an algebraization of first order logic— is an algebra on sets of valuations of variables in an FO-formula. A valuation of variables can be represented as a tuple , where . The set of all valuations can then be represented by a relation of such tuples. In particular, if the FO-formula only involves a finite number of variables, then the representing relation has arity . Note however that has an infinite number of tuples, since the domain of the variables (such as the users of a social media site) should be assumed unbounded. One of the basic connections [16, 17] between FO and Cylindric Set Algebra is that, given any interpretation and FO-formula , the set of valuations under which is true in can be represented as such a relation . Moreover, each logical connective and quantifier corresponds to an operator in the Cylindric Set Algebra. Naturally disjunction corresponds to union, conjunction to intersection, and negation to complement. More interestingly, existential quantification on variable corresponds to cylindrification on column , where

and denotes the valuation (tuple) , where and for . The algebraic counterpart of universal quantification can be derived from cylindrification and complement, or be defined directly as inner cylindrification

In addition, in order to represent equality, the Cylindric Set Algebra also contains constant relations representing the equality . That is, is the set of all valuations , such that .

The objects and of [16, 17] are of course infinitary. In this paper we therefore develop a finitary representation mechanism, namely relations containing universal nulls “” and certain equality literals. These objects are called Star Tables when they represent the records stored in the database. When used as run-time constructs in algebraic query evaluation, they will be called Star Cylinders. Example 1 showed star-tables in a database. The run-time variable binding pattern of the query (1), as well as its algebraic evaluation is shown in the star-cylinders in Example 3 below.

Example 3

Continuing Example 1, in that database the atoms and of query (1) are represented by star-tables and , and the equality atom is represented by the star-cylinder . Note that these are positional relations, the ”attributes” are added for illustrative purposes only.

Alice Chris
Alice
Bob
Chris Bob


Alice Movies
Alice Music
Bob Basketball


2=3

The algebraic translation of query (1) is the SCA-expression

(2)

The intersection of and is carried out as star-intersection , where for instance . The result will contain 12 tuples, and when these are star-intersected with , the star-cylinder will act as a selection by columns 2 and 3 being equal. The result is the star-cylinder below.

Alice Alice Movies
Alice Alice Music
Bob Alice Alice Movies
Bob Alice Alice Music
Bob Bob Bob Basketball
Chris Bob Bob Basketball

The inner star-cylindrification on column 1 then yields

Alice Alice Movies
Alice Alice Music

Finally, applying outer star-cylindrifications on columns 2 and 3 of star-cylinder yields the final result

Movies
Music

The system can now return the answer, i.e. the values of column 4 in cylinder . Note that columns where all rows are “” do not actually have to be materialized at any stage. Negation requires some additional details that will be introduced in Section 3.2.

The aim of this paper is to develop a clean and sound modelling of universal nulls, and furthermore show that the model can be seamlessly extended to incorporate the existential nulls of Imielinski and Lipski [19]. We show that FO and our SCA are equivalent in expressive power when it comes to querying databases containing universal nulls, and that SCA queries can be evaluated (semi) naively. This will be done in three steps: In Section 2 we show the equivalence between FO and Cylindric Set Algebra over infinitary databases. This was of course only the starting point of [16, 17], and we recast the result here in terms of database theory.333Van Den Bussche [9] has recently referred to [16, 17] in similar terms. In Section 3 we introduce our finitary Cylindric Star Algebra. Section 3.1 develops the machinery for the positive case, where there is no negation in the query or database. This is then extended to include negation in Section 3.2. By these two sections we show that certain infinitary cylinders can be finitely represented as star-cylinders, and that our finitary Cylindric Star Algebra on finite star-cylinders mirrors the Cylindric Set Algebra on the infinite cylinders they represent. In Section 4 we tie these two results together, delivering the promised SCA evaluation of FO queries on databases containing universal nulls. In Section 5 we seamlessly extend our framework to also handle existential nulls, and show that naive evaluation can still be used for positive queries (allowing universal quantification, but not negation) on databases containing both universal and existential nulls. Section 6 then shows that all SCA expressions can be evaluated in time polynomial in the size of the database when only universal nulls are present. We also show that when both universal and existential nulls are present, the certain answer to any negation-free (allowing inner cylindrification, i.e. universal quantification) SCA-query can be evaluated naively in polynomial time. When negation is present it has long been known that the problem is coNP-complete for databases containing existential nulls. We show that the problem remains coNP-complete when universal nulls are allowed in addition to the existential ones. For databases containing existential nulls it has been known that database containment and view containment are coNP-complete and -complete, respectively. We also show that the addition of universal nulls does not increase these complexities.

2 Relational calculus and
cylindric set algebra

Throughout this paper we assume a fixed schema , where each , , is a relational symbol with an associated positive integer , called the arity of . The symbol represents equality.


Logic. Our calculus is the standard domain relational calculus. Let be a countably infinite set of variables. We define the set of FO-formulas (over ) in the usual way: and are atomic formulas, and these are closed under and in a well-formed manner possibly using parenthesis’s for disambiguation.

Let be an FO-formula. We denote by the set of variables in , by the set of free variables in , and by the set of subformulas of (for formal definitions, see [1]). If has variables we say that is an FO-formula. We assume without loss of generality that each variable occurs only once in the formula, except in equality literals, and that a formula with variables uses variables .


Instances. Let be a countably infinite domain. An instance (over ) is a mapping that assigns a possibly infinite subset of to each relation symbol , and . Note that our instances are infinite model-theoretic ones. The set of tuples actually recorded in the database will be called the stored database (to be defined in Section 4).

In order to define the (standard) notion of truth of an FO-formula in an instance we first define a valuation to be a mapping . If is a valuation, a variable and , then denotes the valuation which is the same as , except . Then we use the usual recursive definition of , meaning instance satisfying under valuation , i.e.  if , if , and if for some , and so on. Our stored databases will be finite representations of infinite instances, so the semantics of answers to FO-queries will be defined in terms of the infinite instances:

Definition 1

Let be an instance, and an FO-formula with , . Then the answer to on is defined as


Algebra. As noted in [20] the relational algebra is really a disguised version of the Cylindric Set Algebra of Henkin, Monk, and Tarski [16, 17]. We shall therefore work directly with the Cylindric Set Algebra instead of Codd’s Relational Algebra. Apart from the conceptual clarity, the Cylindric Set Algebra will also allow us to smoothly introduce the promised universal nulls.

Let be a fixed positive integer. The basic building block of the Cylindric Set Algebra is an -dimensional cylinder . Note that a cylinder is essentially an infinite -ary relation. They will however be called cylinders, in order to distinguish them from instances. The rows in a cylinder will represent run-time variable valuations, whereas tuples in instances represent facts about the real world. We also have special cylinders called diagonals, of the form representing the equality . We can now define the Cylindric Set Algebra.

Definition 2

Let and be infinite -dimensional cylinders. The Cylindric Set Algebra consists of the following operators.

  1. Union: . Set theoretic union.

  2. Complement: .

  3. Outer cylindrification:

The operation is called outer cylindrification on the :th dimension, and will correspond to existential quantification of variable . For the geometric intuition behind the name cylindrification, see [16, 20]. Intersection is considered a derived operator, and we also introduce the following derived operator:

  1. Inner cylindrification: , corresponding to universal quantification. Note that

We also need the notion of cylindric set algebra expressions.

Definition 3

Let be a sequence of infinite -dimensional cylinders and diagonals. The set of CA-expressions (over ) is obtained by closing the atomic expressions and under union, intersection, complement, and inner and outer cylindrifications. Then , the value of expression on sequence is defined in the usual way, e.g. , , etc.


Equivalence of FO and CA. In the next two theorems we will restate, in the context of the relational model, the correspondence between domain relational calculus and cylindric set algebra as query languages on instances [16, 17]. An expression in cylindric set algebra of dimension will be called a CA-expression. When translating an FO-formula to a CA-expression we first need to extend all -ary relations in to -ary by filling the last columns in all possible ways. Formally, this is expressed as follows:

Definition 4

The horizontal -expansion of an infinite -ary relation is

The equality relation is expanded into diagonals for , where

and for an instance , we have

Once an instance is expanded it becomes a sequence of -dimensional cylinders and diagonals, on which Cylindric Set Algebra Expressions can be applied.

The main technical difficulty in the translation from FO to CA is the correlation of the variables in the FO-sentence with the columns in the expanded relations in the instance. This can be achieved using a derived “swapping” operator that interchanges the columns and , where .444This was already implicitly done in the expansion of in Definition 4. For a definition of swapping using the primitive operators, see Definition 1.5.12 in [16]. Every atom in will correspond to a CA-expression . However, for every occurrence of an atom in we need to interchange the columns with columns . This is achieved by the expression .

Among the many identities holding in Cylindric Set Algebra we will in the sequel need the following ones

Proposition 1

[16]. Let be an -dimensional cylinder, and . Then

  1. If then

  2. If and then

Proposition 2

Let be pairwise distinct natural numbers, such that , and let be an -dimensional cylinder that is 2-full555Cylinder is -full if . and -full. Then

Proof:

The second equality follows from Theorem 1.5.18 in [16], the third equality holds since and , the fourth since . The last two equalities follow from Theorem 1.5.17 and 1.5.13 in [16], respectively.

The entire FO-formula with will then correspond to the CA-expression , where is defined recursively as follows:

  • If where , then

  • If , then .

  • If , then , if , then , and if , then .

  • If , then .

  • If , then .

For an example, let us reformulate the -query from (1) as

When translating the relation is first expanded to , and is expanded to . In order to correlate the variables in with the columns in the expanded databases, we do the shifts and . The equality was expanded to the diagonal so here the variables are already correlated. After this the conjunctions are replaced with intersections and the quantifiers with cylindrifications. Finally, the column corresponding to the free variable in (whose bindings will constitute the answer) is shifted to column 1. The final CA-expression will then be evaluated against  as

We now have . The following fundamental result follows from [16, 17], but we prove it here for the benefit of the readers who don’t want to consult [16, 17].

Theorem 1

For all FO-formulas , there is a CA expression , such that

for all instances .

Proof: We prove the stronger claim: For all FO-formulas , for all , with , there is an CA expression , such that

for all instances . The main claim the follows since , and the outermost sequence of swappings can be considered part of the final expression . In all cases below we assume wlog666 If we can introduce an additional variable and the conjunct which would assure that the :st dimension is full. Alternatively, we could introduce swapping as a primitive in the algebra. This however would require a corresponding renaming operator in the FO-formulas, see [16]. that so that the :st column can be used in the necessary swappings.

  • , where . We let We have

  • . We assume wlog that so that swaps can be performed. We let . We then have

  • , with . We assume wlog that . Then , and the inductive hypothesis is

    We have

  • , with , , , , and777The last assumption is needed in steps . Now . The inductive hypothesis is

    We have

  • , with . Let