1 Introduction
Answering conjunctive queries (CQs) over OWL 2 DL ontologies is a computationally hard [GHLS08a, DBLP:conf/cade/Lutz08], but key problem in many applications. Thus, considerable effort has been devoted to the development of OWL 2 DL fragments for which query answering is tractable in data complexity, which is measured in the size of the data only. Most languages obtained in this way are Horn: ontologies in such languages can always be translated into first-order Horn clauses. This includes the families of ‘lightweight’ languages such as DL-Lite [CDLLR07b], [DBLP:conf/ijcai/BaaderBL05], and DLP [DBLP:conf/www/GrosofHVD03] that underpin the QL, EL, and RL profiles of OWL 2, respectively, as well as more expressive languages, such as Horn- [DBLP:conf/ijcai/HustadtMS05] and Horn- [DBLP:conf/ijcai/OrtizRS11].
Query answering can sometimes be implemented via query rewriting: a rewriting of a query w.r.t. an ontology is another query that captures all information from necessary to answer over an arbitrary data set. Unions of conjunctive queries (UCQs) and datalog are common target languages for query rewriting. They ensure tractability w.r.t. data complexity, while enabling the reuse of optimised data management systems: UCQs can be answered using relational databases [CDLLR07b]
, and datalog queries can be answered using rule-based systems such as OWLim
[bishop2011owlim] and Oracle’s Semantic Data Store [Wu08]. Query rewriting algorithms have so far been developed mainly for Horn fragments of OWL 2 DL, and they have been implemented in systems such as QuOnto [DBLP:conf/aaai/AcciarriCGLLPR05], Rapid [Chortaras11], Presto [DBLP:conf/kr/RosatiA10], Quest [DBLP:conf/kr/Rodriguez-MuroC12], Clipper [DBLP:conf/aaai/EiterOSTX12], Owlgres [DBLP:conf/owled/StockerS08], and Requiem [Hector10a].Horn fragments of OWL 2 DL cannot capture disjunctive knowledge, such as ‘every student is either an undergraduate or a graduate’. Such knowledge occurs in practice in ontologies such as the NCI Thesaurus and the Foundational Model of Anatomy, so these ontologies cannot be processed using known rewriting techniques; furthermore, no query answering technique we are aware of is tractable w.r.t. data complexity when applied to such ontologies. These limitations cannot be easily overcome: query answering in even the basic non-Horn language is -hard w.r.t. data complexity [DBLP:conf/lpar/KrisnadhiL07], and since answering datalog queries is -complete, it may not be possible to rewrite an arbitrary ontology into datalog unless . Furthermore, Lutz:2012ug showed that tractability w.r.t. data complexity cannot be achieved for an arbitrary non-Horn ontology with ‘real’ disjunctions: for each such , a query exists such that answering w.r.t. is -hard.
The result by Lutz:2012ug, however, depends on an interaction between existentially quantified variables in and disjunctions in . Motivated by this observation, we consider the problem of computing datalog rewritings of ground queries (i.e., queries whose answers must map all the variables in to constants) over non-Horn ontologies. Apart from allowing us to overcome the negative result by Lutz:2012ug, this also allows us to compute a rewriting of that can be used to answer an arbitrary ground query. Such queries form the basis of SPARQL, which makes our results practically relevant. We summarise our results as follows.
In Section LABEL:sec:NegativeResults, we revisit the limits of datalog rewritability for a language as a whole and show that non-rewritability of ontologies is independent from any complexity-theoretic assumptions. More precisely, we present an ontology for which query answering cannot be decided by a family of monotone circuits of polynomial size, which contradicts the results by Afrati:1995un, who proved that fact entailment in a fixed datalog program can be decided using monotone circuits of polynomial size. Thus, instead of relying on complexity arguments, we compare the lengths of proofs in and datalog and show that the proofs in may be considerably longer than the proofs in datalog.
In Section LABEL:sec:DatalogRewritings, we present a three-step procedure that takes a -ontology and attempts to rewrite into a datalog program. First, we use a novel technique to rewrite into a TBox without transitivity axioms while preserving entailment of all ground atoms; this is in contrast to the standard techniques (see, e.g., [hms07reasoning]), which preserve entailments only of unary facts and binary facts with roles not having transitive subroles. Second, we use the algorithm by hms07reasoning to rewrite into a disjunctive datalog program . Third, we adapt the knowledge compilation technique by DBLP:journals/ai/Val05 and selman1996knowledge to transform into a datalog program. The final step is not guaranteed to terminate in general; however, if it terminates, the resulting program is a rewriting of .
In Section LABEL:sec:Termination, we show that our procedure always terminates if is a -ontology—a practically-relevant language that extends OWL 2 QL with transitive roles and Boolean connectives. Artale09thedl-lite proved that the data complexity of concept queries in this language is tractable (i.e., -complete). We extend this result to all ground queries and thus obtain a goal-oriented rewriting algorithm that may be suitable for practical use.
Our technique, as well as most rewriting techniques known in the literature, is based on a sound inference system and thus produces only strong rewritings—that is, rewritings entailed by the original ontology. In Section LABEL:sec:LimitsStrong we show that non-Horn ontologies exist that can be rewritten into datalog, but that have no strong rewritings. This highlights the limits of techniques based on sound inferences. It is also surprising since all known rewriting techniques for Horn fragments of OWL 2 DL known to us produce only strong rewritings.
The proofs of all of our technical results are given in