Towards Universal Languages for Tractable Ontology Mediated Query Answering

11/26/2019 ∙ by Heng Zhang, et al. ∙ Nankai University Western Sydney University University of Alberta Tianjin University 0

An ontology language for ontology mediated query answering (OMQA-language) is universal for a family of OMQA-languages if it is the most expressive one among this family. In this paper, we focus on three families of tractable OMQA-languages, including first-order rewritable languages and languages whose data complexity of the query answering is in AC0 or PTIME. On the negative side, we prove that there is, in general, no universal language for each of these families of languages. On the positive side, we propose a novel property, the locality, to approximate the first-order rewritability, and show that there exists a language of disjunctive embedded dependencies that is universal for the family of OMQA-languages with locality. All of these results apply to OMQA with query languages such as conjunctive queries, unions of conjunctive queries and acyclic conjunctive queries.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

Introduction

Ontology mediated query answering (OMQA) is a paradigm that generalizes the traditional database querying by enriching the database with a domain ontology [12]. This paradigm has played an important role in the semantic web [6, 1], data modelling [3], data exchange [9] and data integration [11], and has recently emerged as one of the central issues in knowledge representation as well as in databases.

A long-term major topic for OMQA is to identify proper languages that specify ontologies. There have been a large number of ontology languages proposed for OMQA since the mid 2000s. For instance, in description logics, the DL-Lite family [6], -family [1] and other variants have been proposed and extensively studied. More recently, the Datalog family, a.k.a. existential rule languages, or dependencies in databases, have been rediscovered as promising languages for OMQA, see, e.g., [2, 4, 5]. Most of these languages enjoy good computational properties such as the first-order rewritability or PTIME data complexity.

While all these languages are of their specific features and hence are useful in different applications, it is not realistic to implement OMQA-systems for all of them. So a natural question arises: Can we find the largest one (in the expressiveness) among the family of first-order rewritable (or PTIME-tractable) OMQA-languages? Let us call the largest language in the above sense a universal language. Clearly, it is of great theoretical and practical importance to identify the existence of universal language w.r.t. some kind of tractability, which is also the main task of this paper.

It is worth noting that the universality is one of the major principles for designing languages in both computer science and logic. For example, almost all the traditional programming languages, including C, Java and Prolog, are known to be universal for the family of Turing complete programming languages; propositional logic can express all boolean functions; and by the well-known Lindström theorem the first-order logic is the largest one among the logics that enjoy both the compactness and the Löwenheim-Skolem property; see, e.g., [7]. In databases, first-order language is shown to be universal for the family of query languages with data complexity in AC, and Datalog universal for the family of query languages with data complexity in PTIME; see, e.g., [10].

Some recent work in OMQA has been done along the line of identifying universal languages. CalvaneseGLLR13 CalvaneseGLLR13 proved that, under a certain syntactic classification, some languages in the DL-Lite family are the maximal fragments of description logic with the first-order rewritability. By regarding OMQA as traditional database querying, GRS2014 GRS2014 showed that weakly-guarded tuple-generating dependencies (TGDs) capture the class of EXPTIME queries; RudolphT2015 RudolphT2015 proved that general TGDs capture the class of recursively enumerable queries. In the setting of schema mapping, ZhangZY15 ZhangZY15 showed that the language of weakly-acyclic TGDs is universal for languages of TGDs with finite semi-oblivious chase. All of these results shed new insights on understanding the expressiveness of existential rules, but it is worth noting that OMQA is significantly different from both traditional database querying and schema mapping. To understand the expressiveness in the framework of OMQA, ZhangZY16 ZhangZY16 proved that the language of disjunctive embedded dependencies is universal for the family of recursively enumerable OMQA-languages. Along this line, this paper will focus on tractable OMQA.

Aimed at exploiting universal languages for the tractable OMQA, in this paper we focus on three families of OMQA-languages, including first-order rewritable languages and languages whose data complexity is in AC or PTIME. Our contributions are summarized as follows. On one hand, we prove that there is, in general, no universal language for each of the above families of languages. On the other hand, by restricting the number of database constants involved in query answering, we propose a novel property, called the locality, to approximate the first-order rewritability, and identify the existence of universal language for the family of local OMQA-languages. All of these results hold for OMQA with query languages such as conjunctive queries, unions of conjunctive queries and acyclic conjunctive queries.

Preliminaries

Databases and Instances. We use a countably infinite set of constants and a countably infinite set of variables, and assume they are disjoint. Every term is either a constant or a variable. A relational schema consists of a set of relation symbols. Each relation symbol has an arity which is a natural number. An atoms over (or -atom) is either an equality, or a relational atom built upon terms and relation symbols in . A fact is a variable-free relational atom. Each instance over (or -instance) consists of a set of facts over . Instances that are finite are called databases. Suppose is an instance. Let denote the set of constants that occur in . Let DB denote the class of all databases over schema . Given a set of constants, by we denote the subset of in which each fact involves only constants in .

Let and be instances over a relational schema , and . Then every -homomorphism from to is a function such that (i)  implies for all relation symbols and all tuples of constants, and (ii)  for all . If such exists, we say that is -homomorphic to , and write ; in addition, we write if is injective. For simplicity, will be dropped if it is empty.

Queries. Fix as a relational schema. By a query over (or -query) we mean a formula built upon atoms over in some logic. The logic could be first-order logic, second-order logic, or other variants. A query is boolean if it has no free variables. For convenience, given any query , let denote the set of constants that occur in .

Every first-order formula is called a first-order query. A conjunctive query (CQ) is a query of the form where is a finite but nonempty conjunction of relational atoms. Let be a boolean CQ. We use to denote the database that consists of all the atoms that appear in , where variables in atoms are regarded as special constants. The Gaifman graph of is an undirected graph with each term in as a vertex, and with each pair of distinct terms as an edge if they cooccur in some atom in . A boolean CQ is called acyclic if its Gaifman graph is acyclic. A union of conjunctive query (UCQ) is a first-order formula built upon atoms by connectives and quantifier . Clearly, every UCQ is equivalent to a disjunction of CQs.

Every Datalog program consists of a finite set of rules of the form , where is a relational atom and is a finite conjunction of atoms or negated atoms; and are called the head and the body of the rule, respectively. Each variable in should have at least one positive occurrence in . A relation symbol is called intentional if it has at least one occurrence in the head of some rule, and extensional otherwise. No intensional relation symbol is allowed to appear in a negated atom. A Datalog query is of the form where is a Datalog program, P an extensional relation symbol, and a variable tuple of a proper length. It is well-known that every Datalog query can be translated to an equivalent formula in least fixpoint logic, see, e.g., [8].

Only boolean queries will be used in this work. For convenience, we employ CQ, ACQ and UCQ to denote the classes of boolean CQs, boolean acyclic CQs and boolean UCQs, respectively. Let FO denote the class of boolean first-order queries, FO denote the class of boolean first-order queries that involve two built-in arithmetic relations and , and Datalog denote the class of boolean Datalog queries that involve a built-in successor relation Succ, and special constants min and max, denoting the minimum and the maximum elements, respectively, under the underlying order. Given a class of queries and a relational schema , let denote the class of -queries that belong to .

In the theory of descriptive complexity [10], it was proved that FO and Datalog exactly capture complexity classes AC and PTIME respectively, where AC

denotes the class of languages recognized by a uniform family of circuits with constant depth and polynomial size, and PTIME denotes the class of languages recognized by a deterministic Turing machine in polynomial time.

Dependencies. A disjunctive embedded dependency (DED) over a relational schema is a sentence of the form

where , is a conjunction of relational -atoms involving terms from only, each is a conjunction of atoms over involving terms from only, and each variable in has at least one occurrence in . In particular, is called a tuple-generating dependency (TGD) if it is equality-free and . For simplicity, we omit the universal quantifiers and the brackets appearing outside the atoms.

Let be a database, a set of (first-order) sentences, and a boolean query; all of them are over a common relational schema . We write if, for all -instances with , if for all sentences then , where the satisfaction relation is defined in the standard way.

Ontologies and Languages in OMQA

Before identifying the existence of universal languages for OMQA, we need some notions to clarify what an ontology in OMQA is, and what an ontology language in OMQA is. To make the presented results more applicable, we will define these notions in a language-independent way.

To define ontologies in OMQA, below we generalize the notion introduced in [14] from CQs to more general query languages such as UCQs.

Definition 1.

Let and be relational schemas, and a class of queries. A quasi-OMQA[]-ontology over

is a set of ordered pairs

, where is a -database and a boolean -query in such that .

Moreover, a quasi-OMQA[]-ontology over is called an OMQA[]-ontology if all of the following hold:

  1. is closed under query conjunctions, i.e.,
    .

  2. is closed under query implications, i.e.,
    .

  3. is closed under injective database homomorphisms, i.e.,
    .

Given any logical theory , we can interpret it as a quasi-OMQA[]-ontology over as follows:

It is easy to see that, for theories in almost all the classical logic, is indeed an OMQA-ontology.

With the notion of ontology, we are then able to present an abstract definition for ontology languages in OMQA.

Definition 2.

Let be a finite but nonempty set, and relational schemas, and a class of queries. Then every OMQA[]-language over (with vocabulary ) is defined as an ordered pair such that:

  1. consists of a decidable set of theories, each of which is a finite string over (i.e., an element of );

  2. is a semantic mapping, i.e., a function that maps each theory in to an OMQA[]-ontology over .

Example 1.

Let and be relational schemas, a class of queries, and a decidable class of finite sets of DEDs. Let be a function that maps each set to . It is easy to see that is an OMQA[]-language.

The language defined above is called a DED-language over (induced by ). In particular, if consists of all finite sets of DEDs, we call it the full DED-language over . Unfortunately, it had been proved in [13] that query answering with the full DED-language is uncomputable. In this work, we thus focus on tractable OMQA-languages. We will consider two kinds of tractability:

Definition 3.

Let and be relational schemas, and classes of queries, and a complexity class. An OMQA[]-language over is

  1. -rewritable if there is a computable function that maps each ordered pair to a boolean query such that iff ; in this case, is called a -rewriting function of .

  2. -compilable if there is a computable function that maps each ordered pair to a Turing machine , whose running time belongs to the class , such that iff accepts on the input ; in this case, is called a -compiler of .

Example 2.

According to [4], the language of linear TGDs is both FO-rewritable and AC-compilable, and the language of guarded TGDs is both Datalog-rewritable and PTIME-compilable.

Remark 1.

Clearly, there is a nonuniform way to redefine notions in Definition 3 by allowing rewriting functions and compilers to be uncomputable. However, it is worth noting that languages defined in such a way could be intractable. In fact, there is a nonuniform FO-rewritable OMQA-language in which the query answering is highly undecidable.

Next we give the definition of universal OMQA-language.

Definition 4.

Let be a class of queries, and relational schemas, and and OMQA[]-languages over . Then we say that is at least as expressive as , written , if for each theory there is a theory such that ; and has the same expressiveness as if both and .

An OMQA[]-language is called universal for a family of OMQA[]-languages over if (i) , and (ii) for all languages , we have that .

Nonexistence for the General Case

One ambitious goal in OMQA is to find some universal language for the tractable OMQA. Unfortunately, the following theorem shows that this goal is in general unachievable.

Theorem 1.

Let and be relational schemas such that contains a relation symbol of arity , and suppose and ACQ . Then there is no universal language for the family of -rewritable OMQA[]-languages over .

Since AC and PTIME are exactly captured by FO and Datalog respectively, by Theorem 1 we have

Corollary 2.

Let and be relational schemas such that contains at least one relation symbol of arity , and suppose AC, PTIME and ACQ . Then there is no universal language for the family of -compilable OMQA[]-languages over .

To prove Theorem 1, the general idea is to implement a diagonalization argument as follows. Assume by contradiction that there is a universal language for the desired family. We first give an effective enumeration for all nontrivial ontologies defined in the universal language. With this enumeration, we then construct a new OMQA[]-ontology and a new language in which is definable; Finally we show that is still -rewritable, which leads to a contradiction.

Proof of Theorem 1.

Only consider the case where and . Assume by contradiction that there is a universal language for FO-rewritable OMQA[UCQ]-languages over . Let be such a language. Our task is to define another FO-rewritable OMQA[UCQ]-language that is strictly more expressive than . To do this, we first construct an ontology that is not definable in .

Before we present the construction, some notations are needed. W.l.o.g., we assume that there is a binary relation symbol in . Note that, by a repetition of the parameters, can be always simulated by another relation symbol of arity . For example, one can use to simulate . With this assumption, we first define a sequence of acyclic CQs. For all integers , we define

Intuitively, asserts that there is a cycle-free path (via ) of length in the intended model.

Let be an effective enumeration111I.e., there is a Turing machine to generate such an enumeration. of all the theories in . Such an enumeration clearly exists. Now our task is to construct countably infinite sequences and , where is a sequence of positive integers, and is a sequence of theories in . The sequences are required to have the following properties:

  1. is monotonic increasing, i.e., if ;

  2. For all there exists a database with and ;

  3. For all there exists such that for all databases with .

Procedure 1 is devoted to generate the desired sequences.

1 ;
2 for  to  do
3       for  to  do
4             for  to  do
5                   if   then  goto line 9 ;
6                   if   then
7                         ;
8                         goto line 9 ;
9                        
10                  
11            
12       ;
13       ;
14       delete from ;
15      
Procedure 1 Generating Sequences and

Now we have the following property:

Claim 1. The sequences and generated by Procedure 1 satisfy Properties (1-3).

Proof.

Properties 1 and 2 are clear from Procedure 1. So it remains to show Property 3. Suppose for some . Since has no occurrence in , according to Procedure 1, we know that, whenever lines 5 and 6 are executed for , conditions in both “if” statements must be false. (Otherwise we will have .) In addition, as increases arbitrarily, we know that line 6 must be executed. This means that there is some such that for all databases with , which then yields the claim. ∎

Now we are able to construct the desired ontology. To do this, we first define some notations. For , let denote the sentence , which asserts that the intended domain contains at least elements. Given a boolean UCQ , if there exists an integer such that , let denote where is the least integer among such s, and let denote the sentence (always false) if no such s exist. Furthermore, we define

It is not difficult to prove the following properties:

Claim 2. Let and be boolean UCQs. Then we have:

  1. If then ;

Proof.

1. For the case where there exists no integer such that , we have that , which is always unsatisfiable. This implies that trivially.

Now it remains to show the case where there exists such that . Let be the least integer such that . Then we have . From , we then have . Let be the least integer such that . Then it is clear that . According to Property 1, we also know that , which implies that , or equivalently, . This proves Statement 1.

2. For the case where there is no integer such that , we have that , which implies the desired equivalence. The same argument applies to the case where there is no integer such that .

Now, it remains to consider the case where there are integers and such that and . Let and denote the least integers among such s and s, respectively. W.l.o.g., suppose . Then we have both and . Combining both of them, we know that is the least integer such that . Thus, we have that . On the other hand, it is also clear that , or equivalently , which implies that . Consequently, we obtain that , which completes the proof. ∎

Claim 3. is an OMQA[UCQ]-ontology.

Proof.

The closure property of under injective database homomorphisms is clear since, for any boolean UCQ , is preserved under injective homomorphisms.

Next we show that is closed under query conjunctions. Suppose and . By definition, we have both and , which means that . By Statement 2 of Claim 2, we then have that , which implies that as desired.

Now it remains to show the closure of under query implications. Suppose and . We need to prove . From we have , and from , we have by Statement 1 of Claim 2. Combining both of these, we obtain . By definition, it follows that , which completes the proof. ∎

Claim 4. for any theory .

Proof.

First consider the case where occurs in , and suppose for some . According to the definition of , we know that there is a database with and . On the other hand, by the definition of has no model with , which means that there is no database with and . Consequently, we have .

Now it remains to consider the case where does not occur in . By Claim 1, it suffices to show that for every integer there exists a database with and , or equivalently, has a model that contains at most elements. According to the definition of , the latter is indeed true. This also implies that , and completes the proof immediately. ∎

With Claims 3 and 4, we are now in the position to prove the desired theorem. Let be a binary string such that , and let . Following the decidability of , we have the decidability of . Let be a function that extends by mapping to , and let . By Claim 3, we know that is an OMQA[UCQ]-language. Suppose is an FO-rewriting function. Let be a function that extends by mapping to for all boolean UCQs . By a slight modification to Procedure 1, one can easily devise an algorithm to compute (and ) on given integer . This implies that is computable. By definition, we know that is an FO-rewriting function, which implies that is FO-rewritable. By Claim 4, we also know that is strictly more expressive than , a contradiction as desired. And this completes the proof. ∎

Remark 2.

Since the sentence defined in the above proof is also an FO-sentence, so the proof directly applies to the case of FO. For a proof of the remaining case, one can convert to a Datalog-program.

Locality to the Rescue

In the last section, we proved that there is no universal language for tractable OMQA in general. Then, a natural question arises as to whether one can find a natural property that approximates the tractability but still allows the existence of a universal language. The challenge here is that the property should be manageable enough to avoid a diagonalization argument (see the proof of Theorem 1). Below we propose a property as an approximation of the FO-rewritability.

Locality as Approximation of FO-rewritability

A bound function is a computable function such that for . To simplify the presentation, we fix a way to represent bound functions, e.g., one can represent each bound function by a Turing machine that computes it. A class of bound functions is called decidable if the class of representations of those bound functions is decidable.

To measure the size of a query, we fix a computable function that maps each UCQ to a positive integer. Clearly, there are many methods to define . The only restriction is that we require for all UCQs and .

Definition 5.

Let and be relational schemas, and a class of queries, an OMQA[]-ontology over , and a bound function. Then is called -local if for all boolean -queries and all -databases there is a set , which consists of at most constants, such that

Furthermore, given an OMQA[]-language , a bound function and a class of bound functions, is called -local if all-OMQA[] ontologies defined in is -local, and is -local if it is -local for some bound function .

One might question why the bounded locality is a good approximation to the first-order rewritability. Let denote the class of first-order sentences built on atoms and inequalities by using connectives and the quantifier . Obviously, this class is exactly the class of UCQs with inequalities. It had been observed by Benedikt16 that captures the class of first-order sentences that preserved under injective homomorphisms Benedikt16. It remains open whether such a preservation theorem holds on finite structures (or databases). If this is indeed true, by the following proposition we then have that an OMQA-language is FO-rewritable iff it is -local for some bound function .

Proposition 3.

Let and be relational schemas, an OMQA[UCQ]-ontology over , and a bound function. Then is -local iff for each boolean -UCQ there is a -sentence involving at most terms such that iff for all -databases .

Proof.

For the direction of “if”, let us assume that for each boolean -UCQ there is a -sentence involving at most terms such that iff for all -databases . We need to show that is -local. Let be a boolean -UCQ, and a -sentence involving at most terms such that iff for all -databases . By the assumption, such a sentence does exist. Suppose where is a quantifier-free existential positive first-order formula with inequalities and involving at most terms. Let be a -database. If , then let , where is an assignment such that . Otherwise, let be any subset of such that . In both cases we have the following: (i) , and (ii) iff . From the latter, we know that iff . We thus yields that is -local as desired.

Conversely, suppose is -local. Let be a boolean -UCQ. Given a -database , let be a witness of the locality of w.r.t. , let denote the conjunction of all facts in ; let be the formula obtained from by replacing each constant that does not occur in by a fresh variable; and let denote the sentence , where is the tuple of all variables occurring in , and denotes the conjunction of for each pair of distinct variables . It is easy to see that, up to logical equivalence, there is only a finite number of for all -databases such that . Let be a disjunction of for all -databases with and . Clearly, is equivalent to a -sentence that involves at most terms. To complete the proof, it suffices to show the following property:

Claim. iff for all databases over .

Now it remains to prove the claim. Let be a database over . We first prove the direction of “only if”. Suppose . Since is -local, there should be a -database such that and . By the definition of , we know that is equivalent to a disjunct of . It is clear that . From we the have , which implies as desired.

For the converse, we assume that . Then there is a database over with such that (i) and (ii) is a disjunct of . From (ii) we have , which means that there is an injective -homomorphism from to , where . As is closed under injective database homomorphisms, we have as desired, which completes the proof. ∎

Remark 3.

Proposition 3 reveals an intrinsic connection between the bounded locality and the complexity of rewritings. We will elaborate this in an extended version of this paper.

Universal Language for Local OMQA

Now it remains to know whether the bounded locality allows the existence of universal languages. For convenience, in the rest of this section, we fix as a decidable class of bound functions; fix and as a pair of disjoint relational schemas. The disjointness will not introduce any real limitation. For instance, in a DED-language, given any set of DEDs, one can construct another set of DEDs by introducing a fresh relation symbol for each , and adding copy rules of the form . Clearly, has the same behaviour over as over , where denotes the schema consisting of all the fresh symbols.

Surprisingly, we have the following result.

Theorem 4.

Let be a decidable class of UCQs. Then there exists a DED-language that is universal for the family of -local OMQA[]-languages over .

Let be any bound function in . To prove Theorem 4, the general idea is to develop a transformation that converts every DED set to an -local DED set. In addition, for each DED set that is already -local, the transformation is required to preserve the semantics of query answering. If such a transformation exists, since DED is universal for the family of OMQA-languages in which query answering is recursively enumerable, we then obtain a universal language for the family of -local OMQA-languages.

Let us begin with a finite set of DEDs over a relational schema . To implement the desired transformation, we first show how to construct an -local OMQA[]-ontology from . As a natural idea, one may expect to define the desired ontology by removing all the pairs from the original ontology (defined by ) where is not -local on . Unfortunately, the ontology defined above is in general not well-defined. To construct the desired ontology, the -locality and the closure under both query conjunctions and query implications should be maintained simultaneously.

Below we explain how to construct the ontology. We need to fix a strict linear order over -UCQs firstly. The strict linear order is required to satisfy for all -UCQs and such that . Clearly, such an order always exists. For the given set of DEDs, let be the set that consists of the ordered pair if is a -database, is a -UCQ in , and the following condition holds:

(1)

where is the set of boolean UCQs such that and , and denotes the conjunction of and all UCQs in . Moreover, we define as the minimum superset of that is closed under query conjunctions, query implications and injective database homomorphisms.

The constructed ontology enjoys several properties which will play important roles in our proof for Theorem 4.

Lemma 5.

If is -local, then .

Proof.

Follows from the definition of . ∎

Lemma 6.

is an -local OMQA[]-ontology.

Proof.

We first claim that, for all UCQs with , there exists an integer and UCQs such that for each and . This can be proved by a routine induction of the construction of .

With the claim, w.l.o.g., we assume . Let denote the query . Obviously, it holds that , which implies that immediately. On the other hand, by definition we know that is -local on , i.e., there is a set such that . Let . We then have and . Consequently, we obtain that is -local on .

Now, it remains to show that is -local on . For the case where , from and , we obtain that , which implies that is -local on . For the other case, it must be true that . From the fact that , we know that . Since is true, by Condition (1) we know that there is a set such that , which means that is -local on . This then completes the proof. ∎

Lemma 7.

is recursively enumerable.

Proof.

The lemma is a corollary of the following facts: (1) The validity problem for inference in first-order logic is recursively enumerable; (2) The query containment problem for UCQs is decidable; (3) There are only a finite number of boolean -UCQs with ; (4) There are only a finite number of subsets of . (5) Both and are computable. (6) is decidable. ∎

Now, to define the transformation, it remains to show how to encode the ontology by another set of DEDs. Suppose is the underlying database, and the underlying query. The encoding will be implemented in the following way:

  1. Simulate the query answering of under and .

  2. If the answer of Stage 1 is positive, then nondeterministically copy disjuncts of to generate the universal models.

The main challenges of implementing the above encoding are as follows. Firstly, instead of a single universal model, we need to generate a set of universal models in Stage 2. It is not clear whether the technique of generating universal model in [14] can be applied to this situation. Secondly, to encode the computation in Stage 1, a successor relation is needed. But it seems impossible to define such a relation in the language of DEDs explicitly.

Below we explain how to implement the encoding.

Defining Successor and Arithmetic Relations.   To implement the desired encoding, a successor relation needs to be defined so that the constants in the underlying database can be ranged over. As there is no negation in the body of DEDs, it seems impossible to construct DEDs to traverse ALL constants in . Fortunately, thanks to the closure of OMQA-ontologies under injective database homomorphisms, we do not need a successor relation on the full domain. The reason is as follows. Suppose we want to show that a query is derivable from the database under some ontology . As is closed under injective database homomorphisms, it is equivalent to show whether there is a subset of such that is derivable from under .

To range over subsets of , we employ the partial successor relations on , each of which is a successor relation on some subset of . Clearly, there is a partial successor relation for each subset of . With the mentioned property, we will define some DEDs to generate partial successor relations on . To check whether is derivable from under , it would be sufficient to test whether the computation of Stage 1 halts with “accept” under a certain partial successor relation.

Our method to generate partial successor relations was inspired by RudolphT2015’s technique to define successor relations in the language of TGDs RudolphT2015. In that paper they showed that every homomorphism-closed database query can be defined by a set of TGDs. It is worth noting that the ontology mediated queries focused on this paper are not necessary to be closed under homomorphisms. So their technique cannot be applied directly. Fortunately, a linear order on can be easily defined by a set of DEDs, and with this order, we are able to use their idea to generate all partial successor relations that are compatible with the defined order. Now we show how to implement this idea.

Let AD be a unary relation symbol that will be interpreted as . Clearly, such a relation can be easily defined by some DEDs. With the relation AD, a linear order relation Less over AD can then be defined in a routine way:

(2)
(3)
(4)

To generate all partial successor relations compatible with Less, we link each constant in with a alias by the relation . Suppose and are aliases of constants and respectively, by we mean that is the immediate successor of in the underlying successor relation. The head (resp., the tail) of a partial successor relation is denoted by (resp., ). In particular, we use as the name of the underlying relation. Every partial successor relation is required to have a head and a tail. To generate these relations, we use the following DEDs:

(5)
(6)
(7)
(8)

To understand how these DEDs work in more detail, please refer to the following example.

Figure 1: The Instance Generated in Example 3.
Example 3.

Let be a database which involves only constants and , and suppose the linear order defined by Less is . By an exhaustive application of DEDs (6-8),222To make the figure simple, we use the semi-oblivious chase. we obtain an instance as illustrated by Figure 1.

As seen in Figure 1, there are 7 partial successor relations generated in the instance. For instance, defines the partial successor relation involving only ; defines the relation ; defines the relation .

To encode Stage 1 mentioned before, we need to generate a linear order and the corresponding successor relation on a countably infinite domain, making sure they are compatible with the underlying partial successor relation on . More relations are needed to do this. means that is the least element under the order ; states that is the largest element in under the order ; denotes that is the immediate successor of under the order ; and asserts that is less than under the order . For a technical reason, we also need some auxiliary relations. denotes that is not less than the largest element in under the order , and means that is an alias of and it is used to build the order . All of these are defined by the following DEDs:

(9)
(10)
(11)
(12)
(13)
(14)

With these relations, it is routine to define arithmetic relations such as (asserting that under the order ) and (asserting that the -th bit of the binary representation of is ). We omit the details here.

Simulating Query Answering under .   With a partial successor relation and the related arithmetic relations, we are now in the position to define some DEDs to simulate the query answering of under and .

Our encoding that implements the simulation of query answering is almost the same as that in Section 5.3 of [14]. As proved by ZhangZY16 (see Proposition 6 of ZhangZY16), all recursively enumerable OMQA-ontologies can be recognized by a certain class of Turing machines, called convergent -bounded nondeterministic Turing machines. Although the queries involved in that work are only boolean CQs, by a similar argument one can show that the result can be generalized to the case where boolean UCQs are involved. The only difference is that, to deal with UCQs, we have to change the format of input slightly. Due to the space limit, we omit the details here.

With the result mentioned above, we can then find a convergent -bounded nondeterministic Turing machine to recognize . By employing the DEDs defined in Section 5.3 of [14] (with a slight modification to specify the partial successor relation), we are then able to simulate the computation of on the input .

To restore the result of the query answering, we use a binary relation symbol Accept. By we mean that the machine halts on input with “accept”, and is the partial successor relation to implement the computation.

Generating Universal Models.   By applying all the DEDs that we have constructed, we will obtain the class of boolean UCQs that are derivable from under . With such a class of UCQs, now our task is to construct a universal model set.

Given a class of databases and a set of constants, let

where, for each database , is an isomorphic copy of such that, for any pair of distinct databases , only constants from will be shared by and .

Let be a -database, and an OMQA[UCQ]-ontology over . Given a boolean UCQ , let denote the set consisting of for each disjunct (a boolean CQ) of . Let

Let denote the set that consists of for each minimum hitting set of , where .

Proposition 8.

Let be an OMQA[UCQ]-ontology over , let be a -database, and let be a boolean -UCQ such that . Then iff for all instances .

Proof.

Let denote the set of all boolean UCQs such that . We first show a property as follows.

Claim. iff for all instances .

Proof.

First consider the direction of “only if”. Suppose we have , and let be any instance in . We need to prove that . According to the definition of , we know that there is a minimum hitting set of such that . This implies that for each UCQ there is a disjunct (which is a boolean CQ) of such that has an isomorphic copy in . Consequently, we have that for all boolean UCQs . From the assumption that , we conclude as desired.

Next let us turn to the direction of “if”. Suppose for all instances . Now our task is to prove that . Let be an arbitrary instance such that for all boolean UCQs . Take as any boolean UCQ in . W.l.o.g., we write as the form where each is a boolean CQ. Let be any disjunct of such that . Such a disjunct always exists because . Suppose is of the form