Four-Valued Semantics for Deductive Databases

In this paper, we introduce a novel approach to deductive databases meant to take into account the needs of current applications in the area of data integration. To this end, we extend the formalism of standard deductive databases to the context of Four-valued logic so as to account for unknown, inconsistent, true or false information under the open world assumption. In our approach, a database is a pair (E,R) where E is the extension and R the set of rules. The extension is a set of pairs of the form (f, v) where f is a fact and v is a value that can be true, inconsistent or false - but not unknown (that is, unknown facts are not stored in the database). The rules follow the form of standard Datalogneg rules but, contrary to standard rules, their head may be a negative atom. Our main contributions are as follows: (i) we give an expression of first-degree entailment in terms of other connectors and exhibit a functionally complete set of basic connectors not involving first-degree entailment, (ii) we define a new operator for handling our new type of rules and show that this operator is monotonic and continuous, thus providing an effective way for defining and computing database semantics, and (iii) we argue that our framework allows for the definition of a new type of updates that can be used in most standard data integration applications.

Authors

• 2 publications
• 2 publications
06/30/2011

Coherent Integration of Databases by Abductive Logic Programming

We introduce an abductive method for a coherent integration of independe...
12/24/2020

Handling SQL Nulls with Two-Valued Logic

The design of SQL is based on a three-valued logic (3VL), rather than th...
07/15/2020

Defeasible RDFS via Rational Closure

In the field of non-monotonic logics, the notion of Rational Closure (RC...
08/08/2014

05/05/2014

Revisiting Chase Termination for Existential Rules and their Extension to Nonmonotonic Negation

Existential rules have been proposed for representing ontological knowle...
05/04/2019

A Logic Framework for P2P Deductive Databases

This paper presents a logic framework for modeling the interaction among...
02/03/2022

MV-Datalog+-: Effective Rule-based Reasoning with Uncertain Observations

Modern applications combine information from a great variety of sources....
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

In this paper, we present a novel approach meant to take into account the needs of many current applications, specifically in the domain of data integration. Our purpose is to extend the concept of deductive databases CeriGT90 ; Ullman to the context of Four-valued logic Belnap , a formalism known to be suitable for data integration, as it allows to deal with unknown, inconsistent, true or false information. We begin by illustrating our approach through an example used as our running example throughout the paper.

Running Example. Our example concerns the storage of bags of rice grains, considering two important factors that (among others) influence the design and development of optimum storage, namely color and humidity of the rice grains Batay .

We assume that each bag is tested for the color and humidity of its rice grains in two different sites, first just before leaving the rice farm and then just before entering the warehouse. The outcomes of these tests can be: humid or not humid (with respect to a humidity threshold); and white or not white (with respect to a color threshold). Based on these outputs, the following actions are taken:

• If the grains are not humid and white then store the bags in the warehouse.

• If the grains are humid then do not store the bags but cure the grains.

• If the grains are not white then do not store the bags but analyze further.

We assume that the tests are conducted by sensors: two sensors at the rice farm, one for humidity, denoted , and one for color denoted ; and two sensors at the warehouse denoted and . We also assume that, during a test, if the sensor is functioning then it returns a Boolean value (true or false), otherwise it returns no value. Under these assumptions, one of the following cases can appear for the sensors testing humidity (and similarly for the sensors testing color):

1. The two sensors return the same value.

2. The two sensors return different values.

3. Only one of the two sensors returns a value.

4. Neither of the two sensors returns a value.

In this setting, let , denote the humidity state or ‘value’ of a bag with identifier . Then the question is: what value should we assign to in each of the four cases above? In our formalism, we answer this question by ‘integrating’ the outputs of and as follows (and similarly for the outputs of and ):

1. is set to the common value returned by the sensors.

2. is set to inconsistent, to mean that the sensors returned different values.

3. is set to the value returned by the sensor which returned a value.

4. is set to unknown, to mean that neither of the two sensors returned a value.

As our example shows, we clearly need more than the standard truth values True and False, to express the cases 2 and 4 above. It will be seen that the Four-valued logic introduced in Belnap provides the right formalism as it provides the additional truth values needed and also appropriate connectors to work with these additional truth values. For instance, using a connector denoted by we can express all four cases above in a single expression: .

The database is a pair where collects the sensor outputs and where is a set of rules describing how to integrate these outputs and how to treat the bags based on the integrated values. Formally, the elements of are pairs of the form to represent the output of one sensor about a bag recognized by its identifier. In such pair is a fact regarding the humidity or the color of a bag and is its associated truth value. The rules expressing the integration of the sensor outputs and the conditions regarding the storage of the bags are as follows:

 ρ1:Humid(x)←H1(x)⊕H2(x) ρ5:Cure(x)←Humid(x) ρ2:White(x)←W1(x)⊕W2(x) ρ6:¬Store(x)←¬White(x) ρ3:Store(x)←¬Humid,(x)∧White(x) ρ7:New_test(x)←¬White(x) ρ4:¬Store(x)←Humid(x)

Although the rules above roughly look like standard Datalog rules with negation, the following basic differences have to be noticed:

1. The body of a rule is not restricted to be a conjunction of literals; in fact we allow all available connectors to occur in the body of a rule.

2. The head of a rule is not restricted to be an atom: negative literals are allowed, at the cost of generating contradictory facts.

3. Contradictions are allowed in database semantics and treated as such, in the context of the Four-valued semantics introduced in Belnap .

To illustrate how our approach deals with such rules, we first give a rough overview of the basic notions used in our approach. First, in Four-valued logic, four truth values are considered, namely t, b, n and f, standing respectively for true, inconsistent, unknown111The intuition explaining the notation b and n will be clarified later in this paper. and false.

In this context the pieces of information to be stored in the database extension are pairs of the form where is a fact (i.e. an atom with no variable) and is one of the four truth values just mentioned. By such a pair, which we call valuated pair or v-pair for short, we mean that has truth value . Moreover, we make the intuitively appealing convention that unknown facts are not stored, meaning that the database extension can not contain a v-pair of the form . We emphasize that, contrary to most database approaches in which only true pieces of information are stored, our approach allows to store true, false or even inconsistent pieces of information.

Continuing with our example, assume there are three rice bags with identifiers , and for which the following sensor outputs and corresponding v-pairs are stored in the database:

Regarding bag : and both return False; this results in storing the two v-pairs and in the database extension. returns True but returns no value; this results in storing the pair in the database extension.

Regarding bag : returns True and returns no value; this results in storing the v-pair in the database extension. returns False while returns true; this results in storing the two pairs and in the database extension.

Regarding bag : and both return no value, returns False and returns no value; this results in storing the pair in the database extension.

Roughly speaking, given a set of v-pairs, applying a rule is achieved as follows: for every instantiation of denoted , the truth value of the body of is computed against , and if this truth value is t or b then this truth value is assigned to the head of the . Moreover, as more than one rule head may involve the same fact, in case of conflicting assignment, we apply the integration statements as done for the sensors. We illustrate this processing below.

1. At the first step, the only rules that apply are and .

• Based on the v-pairs and , generates the v-pair stating that the grains in bag are not humid.
As for identifier , since the output of is missing, we consider the (non-stored) v-pair , which combined by with the stored v-pair generates stating that the grains in bag are humid.
As for identifier , since both and no value, generates no v-pair involving , meaning that the humidity of the grains in the bag is unknown.

• As for , since returns no value, generates the v-pair stating that the grains in bag are white.
As for , we notice that and disagree. In this case, generates the v-pair , meaning that the fact is inconsistent, thus that the color of the grains in bag cannot be decided.
As for , since returns no value, generates the v-pair , meaning that the grains in bag cannot be considered white.

2. The next step is based on the v-pairs earlier generated, namely: , , , and . The rules apply as follows:

• Based on and , generates the v-pair . Considering and , since the conjunction of the body is false, does not apply. Since is unknown and is false, the conjunction of the body is false, entailing that does not apply.

• Since is not true, does not apply. Since is true, generates . Since is unknown, does not apply.

• As above, since is not true, does not apply, but generates because is true.

• Similarly, since is not false, and do not apply. Since is inconsistent, and generate respectively and . Moreover, since is false, and generate respectively and .

After applying the rules, conflicting v-pairs involving appear, because has been found false by and inconsistent by . In this case, we integrate these different truth values in much the same way as we did for the sensor outputs, stating that should be inconsistent. Therefore, the v-pair is removed from the result of this step.

3. As no further v-pair can be generated by the rules based on the v-pairs generated in the previous steps, the processing stops and returns the set of all these v-pairs, which added to the database extension constitutes what we call the database semantics.

The obtained database semantics is therefore the set of the following v-pairs:

, , ,

, , , ,

, ,

, , ,

, , ,

, , .

It is shown in this paper that the computation just described in an informal way is sound and its relationship with other related approaches is investigated. Moreover, some basic properties of the underlying Four-valued logic are stated, and among them this example raises the following question: could the rules and be replaced by the single rule ? Whereas this question is answered positively in standard approaches to Datalog databases (CeriGT90 ; Ullman ) and in the Four-valued approach of Fitting91 , we argue that this replacement raises some issues.

This work is an extension of that in Lau2019 where rule bodies are restricted to be conjunctions. The main contributions of this paper are as follows:

1. We show that FDE (First Degree Entailment) implication, one of the standard implications in Four-valued logic, can be expressed in terms of the usual connectors.

2. We exhibit a functionally complete set of basic connectors not involving FDE implication, contrary to the results in Arieli1998 .

3. We generalize the rules by allowing negative literals in their heads and connectors other than negation, conjunction and disjunction in their bodies.

4. We define a new immediate consequence operator for handling such rules, and we show that this operator is monotonic and continuous, thus providing an effective way for defining and computing database semantics.

5. We argue that our context allows for the definition of a new type of updates that can be used in data integration applications. Notice that to the best of our knowledge, the problem of database updating in a Four-valued logic framework has never been addressed in the literature.

The paper is organized as follows: In Section 2 we review the formalism related to Four-valued logic and we address the first two issues mentioned above. Section 3 is devoted to the definitions of the syntax and the semantics of databases in the context of Four-valued logic. In Section 4, we define two types of updates, one standard and another one related to data integration. Then, in Section 5 we review some of the approaches related to our work that can be found in the literature. Section 6 provides an overview of our approach and suggests research issues that we are currently investigating or that we intend to investigate in the next future.

2 Background: Four-Valued Logic

2.1 Basics of Four-Valued Logic

Four-valued logic was introduced by Belnap in Belnap , who argued that this formalism could be of interest when integrating data from various data sources. To this end, denoting by t, b, n and f the four truth values, the usual connectives , and have been defined as shown in Figure 1. An important feature of this Four-valued logic is that it allows to compare truth values according to two partial orderings, known as truth ordering and knowledge ordering, respectively denoted by and and defined by:

;    and    ; .

To explain the choice of b and n as notation for inconsistent and unknown, let be the set of the usual truth values. The four truth values in Four-valued logic can then be thought of as corresponding to the elements in the power set of , by associating respectively , , , with , , , . Then the notation and can be read respectively as none and both. Notice also that, under this association, the ordering , the connectors and are respectively nothing but the restriction to the power set of of set theoretic inclusion, union and intersection.

As in standard two-valued logic, conjunction (respectively disjunction) corresponds to minimum (respectively maximum) truth value, when considering the truth ordering. It has also been shown in Belnap ; Fitting91 that the set equipped with these two orderings has a distributive bi-lattice structure, where the minimum and maximum with respect to are denoted by and , respectively.

Not surprisingly, it should be emphasized that in this Four-valued logic some basic properties holding in standard logic do not hold. For example, Figure 1 shows that formulas of the form are not always true, independently from the truth value of . More importantly, it has been argued in Arieli1998 ; Hazen17 ; Tsoukias that defining the implication by , is problematic.

To see this, we consider as in Belnap ; Arieli1998 ; Hazen17 ; Tsoukias , that t and b are the two designated truth values, because as mentioned above, these truth values are the only ones corresponding to sets containing True. As a consequence, a formula is said to be valid if its truth value is designated, i.e., either t or b.

As argued in Arieli1998 ; Hazen17 ; Tsoukias , does not satisfy the deduction theorem, because the formula defined by is not valid for every truth value assignment. Indeed based on Figure 2, for every assignment such that and , we have and thus, . As a consequence, we discard as the implication providing semantics to our rules.

Among the various implications introduced in the literature, First Degree Entailment implication, or FDE implication, denoted hereafter by (Arieli1998 ; Hazen17 ) is the most popular. We also mention another implication introduced in Tsoukias and denoted hereafter by . Each of these implications is associated with another implication, denoted by and whose role is explained next. The truth tables of all these implications are shown in Figure 2.

Recall from Arieli1998 (Corollary 9) that , is defined ‘from scratch’ in the sense that it cannot be expressed using the other standard connectives , and . As we shall see shortly we can provide an expression of involving standard connectors in the formalism of Tsoukias . It is also important to notice that as shown in Tsoukias , is defined by , where is a complement operator whose truth table is shown in Figure 1.

Moreover, since and are not equivalent, the implication is introduced in Arieli1998 ; Hazen17 as a shorthand for . As a similar situation holds regarding , is defined in Tsoukias as .

In an attempt to compare these implications, we notice that, contrary to , the formula defined by is valid when replacing with one of the implications , , or . It is also interesting to see that when merging the truth values t and b (respectively f and n) into a single value, say TRUE (respectively FALSE), the corresponding truth tables of and are that of the standard implication, while this is not the case for , and . This explains why we discard these three implications. However, the choice between and is not easy for the following reasons:

• In Arieli1998 ; Hazen17 , it is argued that, similarly to two-valued implication, satisfies the property that whenever is designated. However, does not satisfy the properties of given below.

• Although does not satisfy the above property, it is argued in Tsoukias that, similarly to two-valued implication, satisfies the property that if and only if .

We draw attention on that none of these two implications satisfies all intuitively appealing properties that standard two-valued implication satisfies, among which contraposition is an example.

Looking at the truth tables of the two implications and , when the left hand side is valid in , it is necessary that the right hand side be also valid in order to make the implication valid. More precisely, if is valid, the implications and are valid in for any truth assignment such that:

and or ,

and or .

As a consequence, if it happens that is valid while is not, the implication can be made valid by changing the truth value of in two ways: making it either true or inconsistent. As will be seen later, we choose to set as equal to . This choice is motivated by the fact that it is the only one satisfying and .

To see how to express FDE implication in terms of the basic connectors , , , , and of Tsoukias , we recall that is defined for every formula by:

.

Moreover, the additional connectors , , and , whose truth tables are shown in Figure 3, allow to ‘characterize’ each truth value in terms of only the standard ones, namely and . Roughly speaking, given a truth value , the corresponding connector which we denote by , is defined for every formula by the fact that is true if has the truth value and false otherwise.

In what follows, equivalent formulas and are defined as formulas having the same truth tables, which is denoted by . Using this notation, it is shown in Tsoukias that for each of these connectors, the following equivalences hold:

; ; ; .

We now consider an additional connector denoted by , and defined as follows:

.

This new connector ‘characterizes’ the non validity of a formula in terms of the truth values and . In other words, as shown in Figure 3, is true if is not valid and false otherwise.

An important point is that this new connector allows for an intuitively appealing expression of the FDE implication (Arieli1998 ; Hazen17 ) . It is indeed easy to show based on the truth tables of Figure 2 and Figure 3, that for all formulas and , the following equivalence holds:

.

Since can be read as true if is not valid and false otherwise, the equivalence above suggests that can be read as either is not valid or is valid. We emphasize that this is pretty much like implication in standard FOL that is read as either not is true or is true.

Based on these remarks and on truth tables in Figures 13, the following proposition holds. The first item in this proposition is the subject of some comments in the next section.

Proposition 1

Given formulas , and , the following equivalences hold:

.

Functional completeness in our context can be stated as follows: Given a function from to where is a positive integer, can be ‘expressed’ as a formula involving propositional variables ? More formally, given , the problem is to prove that there exists a formula such that for in , if is a valuation such that for , , then .

This question has been answered positively in Arieli1998 where the proposed formula involves the connectors , and and the constants and . The authors give also some other variants of this result by proposing various sets of connectors, all of which containing the implication .

Given that can be expressed as , functional completeness can also be shown based on the connectors introduced in Tsoukias , that is , , , , and , but not . We prove this result in two ways: one based on Arieli1998 , and one more direct, using the connectors defined in Tsoukias .

Proof based on Arieli1998 . In Arieli1998 , it is shown that the language is functionally complete, meaning that for every and every function from to there exists a formula in involving propositional variables such that, for in , if is a valuation such that for , , then

Thus, given from to , by replacing in every occurrence of by we obtain a formula that, using the definitions of and of the connectors N and F, can be expressed by using the basic connectors and the four truth values.

Direct proof based on Tsoukias . Based on the connectors , , and introduced in Tsoukias , every in is associated with a formula defined as follows:

where, for , if , if , if and if .

It is thus easy to see that if for , and otherwise.

Now, given a function from to , we consider the partition induced by on , defined by . For every truth value in , the corresponding element of this partition, which is a subset of , is associated with a formula defined by:

.

It can be seen that for every in , if is in , and otherwise. The targetted formula is defined by:

.

The proof that is indeed the expected formula is done by successively considering the four possible truth values. For , consider the following cases:

• In this case, we have that . On the other hand, if is such that for , , , , and , evaluates as . Thus, .

• In this case, we have that . On the other hand, if is such that for , , , , and , evaluates as . Thus, .

• In this case, we have that . On the other hand, if is such that for , , , , and , evaluates as . Thus, .

• In this case, we have that . On the other hand, if is such that for , , , , and , evaluates as . Thus, .

As a consequence, we obtain that thus that the formula has the same truth values as the truth values defined by the function .

3 Four-Valued Logic and Databases

3.1 Database Syntax

As usual when dealing with deductive databases, the considered alphabet is made of constants, variables and predicate symbols with a fixed arity. We thus assume a fixed set of contants, called universe and denoted by . It should be noticed that may be infinite.

As in traditional approaches, a term is either a constant from or a variable, an atomic formula or an atom is a formula of the form where is a -ary predicate and for every , is a term. A formula is said to be ground if it contains no variables. A fact is a ground atom, that is an atom in which all terms are constants. Moreover, a literal is either an atom or the negation of an atom. In the former case the literal is said to be positive and in the latter case it is said to be negative. The Herbrand Base associated with is the set of all facts that can be built up using the constants in and the predicates. Clearly, if is infinite, then so is .

In the traditional two-valued setting under the CWA (Closed World Assumption Reiter77 ), the database extension and the database semantics are sets of facts, meant to be true, and the facts not in the database semantics are set to be false. In our context of Four-valued logic under the OWA (Open World Assumption), the database extension and the database semantics may contain facts that are either true, inconsistent or false, assuming that non stored facts are unknown. To account for this situation, we consider sets of pairs of the form where is a fact in and where v is one of the values t, b or f, while facts whose truth value is n are not stored. Moreover, such a set is said to be consistent if for all distinct pairs and in , . Consequently a consistent set is seen as a valuation defined for every in by:

, if contains a pair ; , otherwise.

Consistent sets of pairs are called v-sets, standing for valuated sets.

Given a v-set and a ground formula , is said to be valid in if is designated. For example, is valid in , because , but is not valid in because .

The two orderings and are extended to v-sets over the same base in a point-wise manner as follows.

Definition 1

For all v-sets and over , , respectively , holds if for every in , , respectively , holds.

For example for , and , we have . Thus:

• , and , implying that holds.

• , and , implying that holds.

• , because for every , , the least value with respect to .

• and are not comparable with respect to , because and are not comparable with respect to .

The extension of generalizes set inclusion in the sense that if , then we have . Notice that, as the last item above shows, the truth ordering does not satisfy this property, because holds while does not.

In our context, as in approaches to Datalog databases (CeriGT90 ; Bidoit91 ), a database consists of an extension and a set of rules, formally defined as follows.

Definition 2

A database is a pair where and are respectively called the extension and the rule set of . If , then:

• is a v-set.

• is a set of rules of the form where the variables in are free in and and the variables in are free in , and

1. is a well formed formula involving the connectors , , , and . is called the body of , denoted by .

2. is a positive or negative literal, called the head of , denoted by .

It should be clear that the rules as defined above generalize standard Datalog rules (Bidoit91 ). On the other hand, the definition above also generalizes rules as defined in Lau2019 where the bodies of the rules are restricted to be conjunctions only. Moreover, in our approach and contrary to Fitting91 ; Bidoit91 , rules may generate contradictory facts. It is important to notice that our approach is closely related to the generalized rules as introduced in Fitting91 , with the following notable differences:

1. In our approach, negative literals are allowed in the rule heads, which is not the case in Fitting91 .

2. In our approach, several rules may have the same predicate involved in their head, which is not the case in Fitting91 . This important point will be discussed later.

3. In our approach, quantifiers are not allowed, whereas in Fitting91 four quantifiers are allowed ( and associated with and and associated with ).

3.2 Database Semantics

As usual, rules are seen as implications, either or that must be valid in the database semantics. Notice in this respect that Figure 2 shows that for all formulas and , is valid if and only if so is . This explains why in Lau2019 , our approach has been shown to be ‘compatible’ with either implication. Here, we focus on FDE implication , thus forgetting the implication of Tsoukias .

Similarly to the standard Datalog approach, a model of a database could be defined as a v-set containing and in which all rules in are valid. However, such a definition would raise important problems:

1. A database might have no model. To see this, consider where and where , . Then in any model , because must contain the two pairs of . Notice that this cannot happen in standard Datalog since the storage of false facts is not allowed.

2. A database might have more than one minimal model, with respect to set inclusion. This case is illustrated above where , are , two minimal v-sets containing in which is valid. This situation does not happen in standard Datalog because the minimal model is known to be unique.

Whereas the second issue raised above will be further investigated later, the first issue is solved in our approach by giving the priority to the database extension over the rules. To do so, we prevent from applying a rule in when it leads to some conflict with a v-pair in .

In order to implement this policy, given a database over universe , we denote by the set of all instantiations of rules in such that does not occur in . Moreover, given a rule we denote by the formula . The definition of a model of then follows.

Definition 3

Let be a database. A v-set is a model of if the following holds:

, i.e., must contain the database extension, and

every of is valid in , that is, is designated.

To illustrate Definition 3, consider the following simple examples:

• with , and . is a model of as . It is easy to see that is the only minimal model with respect to set inclusion.

• with and , and , are two models of . Moreover, it can be seen that these two models are minimal with respect to set inclusion.

Given a database , an immediate consequence operator is defined below. It will then be seen that this allows for computing a particular model of , which we call the semantics of .

Definition 4

Let be a database. The semantic immediate consequence operator associated with , denoted by , is defined for every v-set by the following steps:

Define first as follows: