Ontologies are now widely used in various domains such as medicine [rector1995terminology, price2000snomed, ruch2008automatic], biology [Sidhu05proteinontology], chemistry [degtyarenko2008chebi], geography [mcmaster2004research, kuipers1996ontological] and many others [article], to represent conceptual knowledge in a formal and easy to understand manner. It is a multi-task effort to construct and maintain such ontologies, often containing thousands of concepts. As these ontologies increase in size and complexity, it becomes more and more challenging for an ontology engineer to understand which parts of the ontology cause a certain consequence to be entailed. If, for example, this consequence is an error, the ontology engineer would want to understand its precise causes, and correct it with minimal disturbances to the rest of the ontology.
To support this task, a technique known as axiom pinpointing was introduced in [Schlobach:2003:NRS:1630659.1630712]. The goal of axiom pinpointing is to identify the minimal sub-ontologies (w.r.t. set inclusion) that entail a given consequence; we call these sets MinAs. There are two basic approaches to axiom pinpointing. The black-box approach [parsia-www05] uses repeated calls to an unmodified decision procedure to find these MinAs. The glass-box approach, on the other hand, modifies the decision algorithm to generate the MinAs during one execution. In reality, glass-box methods do not explicitly compute the MinAs, but rather a compact representation of them known as the pinpointing formula. In this setting, each axiom of the ontology is labelled with a unique propositional symbol. The pinpointing formula is a (monotone) Boolean formula, satisfied exactly by those valuations which evaluate to true the labels of the axioms in the ontology which cause the entailment of the consequence. Thus, the formula points out to the user the relevant parts of the ontology for the entailment of a certain consequence, where disjunction means alternative use of the axioms and conjunction means that the axioms are jointly used.
Axiom pinpointing can be used to enrich a decision procedure for entailment checking by further presenting to the user the axioms which cause a certain consequence. Since glass-box methods modify an existing decision procedure, they require a specification of the decision method to be studied. Previously, general methods for extending tableaux-based and automata-based decision procedures to axiom pinpointing have been studied in detail [DBLP:journals/jar/BaaderP10, DBLP:journals/logcom/BaaderP10]. Classically, automata-based decision procedures often exhibit optimal worst-case complexity, but the most efficient reasoners for standard ontology languages are tableaux-based. When dealing with pinpointing extensions one observes a similar behaviour: the automata-based axiom pinpointing approach preserves the complexity of the original method, while tableau-based axiom pinpointing is not even guaranteed to terminate in general. However, the latter are more goal-directed and lead to a better run-time in practice.
A different kind of reasoning procedure that is gaining interest is known as the consequence-based method. In this setting, rules are applied to derive explicit consequences from previously derived knowledge. Consequence-based decision procedures often enjoy optimal worst-case complexity and, more recently, they have been presented as a promising alternative for tableaux-based reasoners for standard ontology languages [DBLP:conf/dlog/CucalaGH17, Kaza09, horrocks-kr16, DBLP:journals/ai/SimancikMH14, SiKH-IJCAI11, DBLP:conf/csemws/WangH12, DBLP:conf/dlog/KazakovK14a]. Consequence-based algorithms have been previously described as simple variants of tableau algorithms [baader-ki07], and as syntactic variants of automata-based methods [HuPe17]. They share the positive complexity bounds of automata, and the goal-directed nature of tableaux.
In this work, we present a general approach to produce axiom pinpointing extensions of consequence-based algorithms. Our driving example and use case is the extension of the consequence-based algorithm for entailment checking for the prototypical ontology language [DBLP:conf/dlog/KazakovK14a]. We show that the pinpointing extension does not change the ExpTime complexity of the consequence-based algorithm for .
We briefly introduce the notions needed for this paper. We are interested in the problem of understanding the causes for a consequence to follow from an ontology. We consider an abstract notion of ontology and consequence relation. For the sake of clarity, however, we instantiate these notions to the description logic .
2.1 Axiom Pinpointing
To keep the discourse as general as possible, we consider an ontology language to define a class of axioms. An ontology is then a finite set of axioms; that is, a finite subset of . We denote the set of all ontologies as . A consequence property (or c-property for short) is a binary relation that relates ontologies to axioms. If , we say that is a consequence of or alternatively, that entails .
We are only interested in relations that are monotonic in the sense that for any two ontologies and axiom such that , if then . In other words, adding more axioms to an ontology will only increase the set of axioms that are entailed from it. For the rest of this paper whenever we speak about a c-property, we implicitly assume that it is monotonic in this sense.
Notice that our notions of ontology and consequence property differ from previous work. In [DBLP:journals/jar/BaaderP10, DBLP:journals/logcom/BaaderP10], c-properties are defined using two different types of statements and ontologies are allowed to require additional structural constraints. The former difference is just syntactic and does not change the generality of our approach. In the latter case, our setting becomes slightly less expressive, but at the benefit of simplifying the overall notation and explanation of our methods. As we notice at the end of this paper, our results can be easily extended to the more general setting from [DBLP:journals/jar/BaaderP10, DBLP:journals/logcom/BaaderP10].
When dealing with ontology languages, one is usually interested in deciding whether an ontology entails an axiom ; that is, whether . In axiom pinpointing, we are more interested in the more detailed question of why it is a consequence. More precisely, we want to find the minimal (w.r.t. set inclusion) sub-ontologies such that still holds. These subsets are known as MinAs [DBLP:journals/jar/BaaderP10, DBLP:journals/logcom/BaaderP10], justifications [parsia-iswc07], or MUPS [Schlobach:2003:NRS:1630659.1630712]—among many other names—in the literature. Rather than enumerating all these sub-ontologies explicitly, one approach is to compute a formula, known as the pinpointing formula, that encodes them.
Formally, suppose that every axiom is associated with a unique propositional variable , and let be the set of all the propositional variables corresponding to axioms in the ontology . A monotone Boolean formula over is a Boolean formula using only variables in and the connectives for conjunction () and disjunction (). The constants and , always evaluated to true and false, respectively, are also monotone Boolean formulae. We identify a propositional valuation with the set of variables which are true in it. For a valuation and a set of axioms , the -projection of is the set . Given a c-property and an axiom , a monotone Boolean formula over is called a pinpointing formula for w.r.t if for every valuation :
2.2 Description Logics
Description logics (DLs) [BCNMP03] are a family of knowledge representation formalisms that have been successfully applied to represent the knowledge of many application domains, in particular from the life sciences [article]. We briefly introduce, as a prototypical example, , which is the smallest propositionally closed description logic.
Given two disjoint sets and of concept names and role names, respectively, concepts are defined through the grammar rule:
where and . A general concept inclusion (GCI) is an expression of the form , where are concepts. A TBox is a finite set of GCIs.
The semantics of this logic is given in terms of interpretations which are pairs of the form where is a finite set called the domain, and is the interpretation function that maps every concept name to a set and every role name to a binary relation . The interpretation function is extended to arbitrary concepts inductively as shown in Figure 1.
Following this semantics, we introduce the usual abbreviations , , , and . That is, stands for a (DL) tautology, and for a contradiction. The interpretation satisfies the GCI iff . It is a model of the TBox iff it satisfies all the GCIs in .
One of the main reasoning problems in DLs is to decide subsumption between two concepts w.r.t. a TBox ; that is, to verify that every model of the TBox also satisfies the GCI . If this is the case, we denote it as . It is easy to see that the relation defines a c-property over the class of axioms containing all possible GCIs; in this case, an ontology is a TBox.
The following example instantiates the basic ideas presented in this section.
Consider for example the TBox containing the axioms
where are the propositional variables labelling the axiom. It is easy to see that , and there are two justifications for this fact; namely, the TBoxes and . From this, it follows that is a pinpointing formula for w.r.t. .
3 Consequence-based Algorithms
Abstracting from particularities, a consequence-based algorithm works on a set of consequences, which is expanded through rule applications. Algorithms of this kind have two phases. The normalization phase first transforms all the axioms in an ontology into a suitable normal form. The saturation phase initializes the set of derived consequences with the normalized ontology and applies the rules to expand it. The set is often called a state. As mentioned, the initial state contains the normalization of the input ontology . A rule is of the form , where are finite sets of consequences. This rule is applicable to the state if and . Its application extends to . is saturated if no rule is applicable to it. The method terminates if is saturated after finitely many rule applications, independently of the rule application order chosen. For the rest of this section and most of the following, we assume that the input ontology is already in this normal form, and focus only on the second phase.
Given a rule , we use and to denote the sets of premises that trigger and of consequences resulting of its applicability, respectively. If the state is obtained from through the application of the rule , we write , and denote if the precise rule used is not relevant.
Consequence-based algorithms derive, in a single execution, several axioms that are entailed from the input ontology. Obviously, in general they cannot generate all possible entailed axioms, as such a set may be infinite (e.g., in the case of ). Thus, to define correctness, we need to specify for every ontology , a finite set of derivable consequences of .
Definition 1 (Correctness).
A consequence-based algorithm is correct for the consequence property if for every ontology , the following two conditions hold: (i) it terminates, and (ii) if and is saturated, then for every derivable consequence it follows that iff .
That is, the algorithm is correct for a property if it terminates and is sound and complete w.r.t. the finite set of derivable consequences .
Notice that the definition of correctness requires that the resulting set of consequences obtained from the application of the rules is always the same, independently of the order in which the rules are applied. In other words, if , , and are both saturated, then . This is a fundamental property that will be helpful for showing correctness of the pinpointing extensions in the next section.
A well-known example of a consequence-based algorithm is the reasoning method from [SiKH-IJCAI11]. To describe this algorithm we need some notation. A literal is either a concept name or a negated concept name. Let denote (possibly empty) conjunctions of literals, and are (possibly empty) disjunctions of concept names. For simplicity, we treat these conjunctions and disjunctions as sets. The normalization phase transforms all GCIs to be of the form:
For a given TBox , the set of derivable consequences contains all GCIs of the form and . The saturation phase initializes to contain the axioms in the (normalized) TBox, and applies the rules from Table 1 until a saturated state is found.
After termination, one can check that for every derivable consequence it holds that iff ; that is, this algorithm is correct for the property [SiKH-IJCAI11].
Recall the TBox from Example 1. Notice that all axioms in this TBox are already in normal form; hence the normalization step does not modify it. The consequence-based algorithm starts with and applies the rules until saturation. One possible execution of the algorithm is
where contains and the result of adding all the tautologies generated by the application of Rule over it (see Figure 2).
Since rule applications only extend the set of consequences, we depict exclusively the newly added consequence; e.g., the first rule application is in fact representing . When the execution of the method terminates, the set of consequences contains ; hence we can conclude that this subsumption follows from . Notice that other consequences (e.g., ) are also derived from the same execution.
For the rest of this paper, we consider an arbitrary, but fixed, consequence-based algorithm, that is correct for a given c-property .
4 The Pinpointing Extension
Our goal is to extend consequence-based algorithms from the previous section to methods that compute pinpointing formulae for their consequences. We achieve this by modifying the notion of states, and the rule applications on them. Recall that every axiom in in the class (in hence, also every axiom in the ontology ) is labelled with a unique propositional variable . In a similar manner, we consider sets of consequences that are labelled with a monotone Boolean formula. We use the notation to indicate that the elements in a the set are labelled in this way, and use to express that the consequence , labelled with the formula , belongs to . A pinpointing state is a set of labelled consequences. We assume that each consequence in this set is labelled with only one formula. For a set of labelled consequences and a set of (unlabelled) consequences , we define , where if .
A consequence-based algorithm induces a pinpointing consequence-based algorithm by modifying the notion of rule application, and dealing with pinpointing states, instead of classical states, through a modification of the formulae labelling the derived consequences.
Definition 2 (Pinpointing Application).
The rule is pinpointing applicable to the pinpointing state if . The pinpointing application of this rule modifies to:
The pinpointing state is pinpointing saturated if no rule is pinpointing applicable to it.
We denote as the fact that is obtained from the pinpointing application of the rule to . As before, we drop the subscript if the name of the rule is irrelevant and write simply . The pinpointing extension starts, as the classical one, with the set of all normalized axioms. For the rest of this section, we assume that the input ontology is already normalized, and hence each axiom in the initial pinpointing state is labelled with its corresponding propositional variable. In the next section we show how to deal with normalization.
Consider again the TBox from Example 1. At the beginning of the execution of the pinpointing algorithm, the set of consequences is the TBox, with each axiom labelled by the unique propositional variable representing it; that is . A pinpointing application of Rule 1 adds the new consequence , where the tautology labelling this consequence arises from the fact that rule 1 has no premises. At this point, one can pinpointing apply Rule 5 with
(see the solid arrow in Figure 2). In this case, and because the consequence does not belong to yet. Hence , and the rule is indeed pinpointing applicable. The pinpointing application of this rule adds the new labelled consequence to . Then, Rule 3 becomes pinpointing applicable with
which adds to the set of consequences. Then, Rule 7 over the set of premises
yields the new consequence .
Notice that, at this point Rule 6 is not applicable in the classical case over the set of premises because its (regular) application would add the consequence that was already derived. However,
hence, the rule is in fact pinpointing applicable. The pinpointing application of this Rule 6 substitutes the labelled consequence with the consequence . The pinpointing extension will then continue applying rules until a saturated state is reached. This execution is summarized in Figure 3.
At that point, the set of labelled consequences will contain, among others, . The label of this consequence corresponds to the pinpointing formula that was computed in Example 1.
Notice that if a rule is applicable to some state , then it is also pinpointing applicable to it. This holds because the regular applicability condition requires that at least one consequence in should not exist already in the state , which is equivalent to having the consequence . Indeed, we used this fact in the first pinpointing rule applications of Example 3. If the consequence-based algorithm is correct, then it follows by definition that for any saturated state obtained by a sequence of rule applications from , iff . Conversely, as shown next, every consequence created by a pinpointing rule application is also generated by a regular rule application. First, we extend the notion of a -projection to sets of consequences (i.e., states) in the obvious manner: .
Let be pinpointing states and let be a valuation. If then .
We show that if then or , where is a rule. If does not satisfy then since the labels of the newly added assertions are not satisfied by , and the disjunction with does not change the evaluation of the modified labels under . On the other hand, if satisfies then . If then . Otherwise, again we have . ∎
Since all the labels are monotone Boolean formulae, it follows that the valuation that makes every propositional variable true satisfies all labels, and hence for every pinpointing state , . Lemma 1 hence entails that the pinpointing extension of the consequence-based algorithm does not create new consequences, but only labels these consequences. Termination of the pinpointing extension then follows from the termination of the consequence-based algorithm and the condition for pinpointing rule application that entails that, whenever a rule is pinpointing applied, the set of labelled consequences is necessarily modified either by adding a new consequence, or by modifying the label of at least one existing consequence to a weaker (i.e., more general) monotone Boolean formula. Since there are only finitely many monotone Boolean formulas over , every label can be changed finitely many times only.
It is in fact possible to get a better understanding of the running time of the pinpointing extension of a consequence-based algorithm. Suppose that, on input , the consequence-based algorithm stops after at most rule applications. Since every rule application must add at least one consequence to the state, the saturated state reached by this algorithm will have at most consequences. Consider now the pinpointing extension of . We know, from the previous discussion, that this pinpointing extension generates the same set of consequences. Moreover, since there are possible valuations over , and every pinpointing rule application that does not add a new consequence must generalize at least one formula, the labels of each consequence can be modified at most times. Overall, this means that the pinpointing extension of stops after at most rule applications. We now formalize this result.
If a consequence-based algorithm stops after at most rule applications, then stops after at most rule applications.
Another important property of the pinpointing extension is that saturatedness of a state is preserved under projections.
Let be a pinpointing state and a valuation. If is pinpointing saturated then is saturated.
Suppose there is a rule such that is applicable to . This means that and . We show that is pinpointing applicable to . Since , satisfies . As , there is such that either or but does not satisfy . In the former case, is clearly pinpointing applicable to . In the latter, since satisfies but not . ∎
We can now show that the pinpointing extension of a consequence-based algorithm is indeed a pinpointing algorithm; that is, that when a saturated pinpointing state is reached from rule applications starting from , then for every , is a pinpointing formula for w.r.t. .
Theorem 4.2 (Correctness of Pinpointing).
Let be a c-property on axiomatized inputs for and . Given a correct consequence-based algorithm for , for every axiomatized input , where is normalized, then
if , , and is pinpointing saturated, then is a pinpointing formula for and .
We want to show that is a pinpointing formula for and . That is, for every valuation : iff satisfies .
Assume that , i.e., , and let . Since terminates on every input, there is a saturated state such that . Completeness of then implies that . By assumption, and is pinpointing saturated. By Lemma 1 it follows that , and by Lemma 2, is saturated. Hence, since is correct, . This implies that because .
Conversely, suppose that satisfies . By assumption, , , and is saturated. By Lemma 1, . Since satisfies , . Then, by soundness of , . ∎
As it was the case for classical consequence-based algorithms, their pinpointing extensions can apply the rules in any desired order. The notion of correctness of consequence-based algorithms guarantees that a saturated state will always be found, and the result will be the same, regardless of the order in which the rules are applied. We have previously seen that termination transfers also the pinpointing extensions. Theorem 4.2 also shows that the formula associated to the consequences derived is always equivalent.
Let two pinpointing saturated states, an ontology, and a consequence such that and . If and , then .
To finalize this section, we consider again our running example of deciding subsumption in described in Section 3. It terminates after an exponential number of rule applications on the size of the input TBox . Notice that every pinpointing rule application requires an entailment test between two monotone Boolean formulas, which can be decided in non-deterministic polynomial time on . Thus, it follows from Theorem 4.1 that the pinpointing extension of the consequence-based algorithm for runs in exponential time.
Let be an TBox, and two concepts. A pinpointing formula for w.r.t. is computable in exponential time.
5 Dealing with Normalization
Throughout the last two sections, we have disregarded the first phase of the consequence-based algorithms in which the axioms in the input ontology are transformed into a suitable normal form. In a nutshell, the normalization phase takes every axiom in the ontology and substitutes it by a set of simpler axioms that are, in combination, equivalent to the original one w.r.t. the set of derivable consequences. For example, in the axiom is not in normal form. During the normalization phase, it would then be substituted by the two axioms , , which in combination provide the exact same constraints as the original axiom.
Obviously, in the context of pinpointing, we are interested in finding the set of original axioms that cause the consequence of interest, and not those in normal form; in fact, normalization is an internal process of the algorithm, and the user should be agnostic to the internal structures used. Hence, we need to find a way to track the original axioms.
To solve this, we slightly modify the initialization of the pinpointing extension. Recall from the previous section that, if the input ontology is already in normal form, then we initialize the algorithm with the state that contains exactly that ontology, where every axiom is labelled with the unique propositional variable that represents it. If the ontology is not originally in normal form, then it is first normalized. In this case, we set as the initial state the newly normalized ontology, but every axiom is labelled with the disjunction of the variables representing the axioms that generated it. The following example explains this idea.
Consider a variant of the TBox from Example 1 that is now formed by the three axioms
Obviously, the first axiom is not in normal form, but can be normalized by substituting it with the two axioms , . Thus, the normalization step yields the same TBox from Example 1. However, instead of using different propositional variables to label these two axioms, they just inherit the label from the axiom that generated them; in this case . Thus, the pinpointing algorithm is initialized with
Following the same process as in Example 3, we see that we can derive the consequence . Hence is a pinpointing formula for w.r.t. . It can be easily verified that this is in fact the case.
Thus, the normalization phase does not affect the correctness, nor the complexity of the pinpointing extension of a consequence-based algorithm.
We presented a general framework to extend consequence-based algorithms with axiom pinpointing. These algorithms often enjoy optimal upper bound complexity and can be efficiently implemented in practice. Our focus in this paper and use case is for the prototypical ontology language . We emphasize that this is only one of many consequence-based algorithms available. The completion-based algorithm for [baader-ijcai05] is obtained by restricting the assertions to be of the form and with and , and adding one rule to handle role constructors. Other examples of consequence-based methods include LTUR approach for Horn clauses [minoux-ipl88], and methods for more expressive and Horn DLs [Kaza09, horrocks-kr16, kazakov-jar14].
Understanding the axiomatic causes for a consequence, and in particular the pinpointing formula, has importance beyond MinA enumeration. For example, the pinpointing formula also encodes all the ways to repair an ontology [arif-ki15]. Depending on the application in hand, a simpler version of the formula can be computed, potentially more efficiently. This idea has already been employed to find good approximations for MinAs [BaaSun-KRMED-08] and lean kernels [PMIM17] efficiently.
As future work, it would be interesting to investigate how algorithms for query answering in an ontology-based data access setting can be extended with the pinpointing technique. The pinpointing formula in this case could also be seen as a provenance polynomial, as introduced by Green et. al [Green07-provenance-seminal], in database theory. Another direction is to investigate axiom pinpointing in decision procedures for non-monotonic reasoning, where one would also expect the presence of negations in the pinpointing formula.