## 1 Introduction

Ontologies are now widely used in various domains such as medicine [rector1995terminology, price2000snomed, ruch2008automatic], biology [Sidhu05proteinontology], chemistry [degtyarenko2008chebi], geography [mcmaster2004research, kuipers1996ontological] and many others [article], to represent conceptual knowledge in a formal and easy to understand manner. It is a multi-task effort to construct and maintain such ontologies, often containing thousands of concepts. As these ontologies increase in size and complexity, it becomes more and more challenging for an ontology engineer to understand which parts of the ontology cause a certain consequence to be entailed. If, for example, this consequence is an error, the ontology engineer would want to understand its precise causes, and correct it with minimal disturbances to the rest of the ontology.

To support this task, a technique known as
*axiom pinpointing* was
introduced in [Schlobach:2003:NRS:1630659.1630712].
The goal of axiom pinpointing is to identify the minimal sub-ontologies (w.r.t. set inclusion)
that entail a given consequence; we call these sets *MinAs*.
There are two basic approaches to axiom pinpointing.
The *black-box approach* [parsia-www05] uses repeated calls to an unmodified decision procedure
to find these MinAs. The *glass-box approach*, on the other hand, modifies the
decision algorithm to generate the MinAs during one execution. In reality, glass-box methods
do not explicitly compute the MinAs, but rather a compact representation of them known as
the *pinpointing formula*.
In this setting, each axiom of the ontology is labelled with a unique
propositional symbol. The pinpointing formula is a (monotone) Boolean
formula, satisfied exactly by those valuations which evaluate to true
the labels of the axioms in the ontology which cause the entailment of
the consequence. Thus, the formula points out to the user the relevant parts of the
ontology for the entailment of a certain consequence, where disjunction means
alternative use of the axioms and conjunction means that
the axioms are jointly used.

Axiom pinpointing can be used to enrich a decision procedure for entailment checking by further presenting to the user the axioms which cause a certain consequence. Since glass-box methods modify an existing decision procedure, they require a specification of the decision method to be studied. Previously, general methods for extending tableaux-based and automata-based decision procedures to axiom pinpointing have been studied in detail [DBLP:journals/jar/BaaderP10, DBLP:journals/logcom/BaaderP10]. Classically, automata-based decision procedures often exhibit optimal worst-case complexity, but the most efficient reasoners for standard ontology languages are tableaux-based. When dealing with pinpointing extensions one observes a similar behaviour: the automata-based axiom pinpointing approach preserves the complexity of the original method, while tableau-based axiom pinpointing is not even guaranteed to terminate in general. However, the latter are more goal-directed and lead to a better run-time in practice.

A different kind of reasoning procedure that is gaining interest is known as the consequence-based method. In this setting, rules are applied to derive explicit consequences from previously derived knowledge. Consequence-based decision procedures often enjoy optimal worst-case complexity and, more recently, they have been presented as a promising alternative for tableaux-based reasoners for standard ontology languages [DBLP:conf/dlog/CucalaGH17, Kaza09, horrocks-kr16, DBLP:journals/ai/SimancikMH14, SiKH-IJCAI11, DBLP:conf/csemws/WangH12, DBLP:conf/dlog/KazakovK14a]. Consequence-based algorithms have been previously described as simple variants of tableau algorithms [baader-ki07], and as syntactic variants of automata-based methods [HuPe17]. They share the positive complexity bounds of automata, and the goal-directed nature of tableaux.

In this work, we present a general approach to produce axiom pinpointing extensions of consequence-based algorithms. Our driving example and use case is the extension of the consequence-based algorithm for entailment checking for the prototypical ontology language [DBLP:conf/dlog/KazakovK14a]. We show that the pinpointing extension does not change the ExpTime complexity of the consequence-based algorithm for .

## 2 Preliminaries

We briefly introduce the notions needed for this paper. We are interested in the problem of understanding the causes for a consequence to follow from an ontology. We consider an abstract notion of ontology and consequence relation. For the sake of clarity, however, we instantiate these notions to the description logic .

### 2.1 Axiom Pinpointing

To keep the discourse as general as possible, we consider an *ontology language* to define a class
of *axioms*. An *ontology* is then a finite set of axioms; that is, a finite subset of . We
denote the set of all ontologies as .
A *consequence property* (or *c-property* for short) is a binary relation
that relates ontologies to axioms.
If , we say that is a *consequence* of or alternatively, that
*entails* .

We are only interested in relations that are monotonic in the sense that for any two ontologies and axiom such that , if then . In other words, adding more axioms to an ontology will only increase the set of axioms that are entailed from it. For the rest of this paper whenever we speak about a c-property, we implicitly assume that it is monotonic in this sense.

Notice that our notions of ontology and consequence property differ from previous work. In [DBLP:journals/jar/BaaderP10, DBLP:journals/logcom/BaaderP10], c-properties are defined using two different types of statements and ontologies are allowed to require additional structural constraints. The former difference is just syntactic and does not change the generality of our approach. In the latter case, our setting becomes slightly less expressive, but at the benefit of simplifying the overall notation and explanation of our methods. As we notice at the end of this paper, our results can be easily extended to the more general setting from [DBLP:journals/jar/BaaderP10, DBLP:journals/logcom/BaaderP10].

When dealing with ontology languages, one is usually interested in deciding whether an ontology entails an
axiom ; that is, whether . In axiom pinpointing, we are more interested in the
more detailed question of *why* it is a consequence. More precisely, we want to find the minimal
(w.r.t. set inclusion) sub-ontologies such that still holds. These
subsets are known as *MinAs* [DBLP:journals/jar/BaaderP10, DBLP:journals/logcom/BaaderP10],
*justifications* [parsia-iswc07], or
*MUPS* [Schlobach:2003:NRS:1630659.1630712]—among many other
names—in the literature. Rather than enumerating all these sub-ontologies explicitly, one approach is to
compute a formula, known as the pinpointing formula, that encodes them.

Formally, suppose that every axiom is associated with a unique propositional variable
, and let be the set of all the propositional variables corresponding to axioms
in the ontology .
A *monotone Boolean formula* over
is a Boolean formula using only variables in and the
connectives for conjunction () and disjunction (). The constants
and , always evaluated to true and false, respectively, are
also monotone Boolean formulae. We identify a propositional
valuation with the set of variables which are true in it.
For a valuation and a set of axioms , the *-projection of *
is the set .
Given a c-property and an axiom ,
a monotone Boolean formula over
is called a *pinpointing formula* for w.r.t
if for every valuation :

### 2.2 Description Logics

Description logics (DLs) [BCNMP03] are a family of knowledge representation formalisms that have been successfully applied to represent the knowledge of many application domains, in particular from the life sciences [article]. We briefly introduce, as a prototypical example, , which is the smallest propositionally closed description logic.

Given two disjoint sets and of *concept names* and *role names*, respectively,
*concepts* are defined through the grammar rule:

where and . A *general concept inclusion* (GCI) is an expression of the form
, where are concepts. A *TBox* is a finite set of GCIs.

The semantics of this logic is given in terms of *interpretations* which are pairs of the form
where is a finite set called the *domain*, and
is the *interpretation function* that maps every concept name to a set
and every role name to a binary relation
. The interpretation function is extended to arbitrary concepts inductively as shown in Figure 1.

Following this semantics, we introduce the usual abbreviations ,
, , and . That is, stands for
a (DL) tautology, and for a contradiction.
The interpretation *satisfies* the GCI iff . It is a
*model* of the TBox iff it satisfies all the GCIs in .

One of the main reasoning problems in DLs is to decide *subsumption* between two concepts
w.r.t. a TBox ; that is, to verify that every model of the TBox also satisfies the GCI .
If this is the case, we denote it as .
It is easy to see that the relation defines a c-property over the class of axioms containing all
possible GCIs; in this case, an ontology is a TBox.

The following example instantiates the basic ideas presented in this section.

###### Example 1.

Consider for example the TBox containing the axioms

where are the propositional variables labelling the axiom. It is easy to see that , and there are two justifications for this fact; namely, the TBoxes and . From this, it follows that is a pinpointing formula for w.r.t. .

## 3 Consequence-based Algorithms

Abstracting from particularities, a *consequence-based algorithm* works on a set
of *consequences*, which is expanded through rule applications. Algorithms of this kind have
two phases. The *normalization* phase first transforms all the axioms in an ontology into a suitable normal
form.
The
*saturation* phase initializes the set of *derived consequences* with the normalized ontology
and applies the rules to expand it. The set is often called a *state*.
As mentioned, the initial
state contains the normalization of the input ontology . A *rule* is of
the form , where are finite sets of consequences.
This rule is *applicable* to the state if
and . Its *application*
extends to . is *saturated* if no rule is applicable to it. The method
*terminates* if is saturated after finitely many rule applications, independently of the rule
application order chosen.
For the rest of this section and most of the following, we assume that the input ontology is already in this normal
form, and focus only on the second phase.

Given a rule , we use and to denote the sets of premises that trigger and of consequences resulting of its applicability, respectively. If the state is obtained from through the application of the rule , we write , and denote if the precise rule used is not relevant.

Consequence-based algorithms derive, in a single execution, several axioms that are entailed from the input
ontology. Obviously, in general they cannot generate *all* possible entailed axioms, as such a set may be
infinite (e.g., in the case of ). Thus, to define correctness, we need to specify for every ontology , a
finite set of *derivable consequences* of .

###### Definition 1 (Correctness).

A consequence-based algorithm is *correct* for the consequence property if for every ontology
, the following two conditions hold: (i) it terminates, and
(ii) if and is saturated, then for every derivable consequence it
follows that iff .

That is, the algorithm is correct for a property if it terminates and is sound and complete w.r.t. the finite set of derivable consequences .

Notice that the definition of correctness requires that the resulting set of consequences obtained from the application of the rules is always the same, independently of the order in which the rules are applied. In other words, if , , and are both saturated, then . This is a fundamental property that will be helpful for showing correctness of the pinpointing extensions in the next section.

A well-known example of a consequence-based algorithm is the reasoning method from [SiKH-IJCAI11].
To
describe this algorithm we need some notation. A *literal* is either a concept name or a negated concept
name. Let denote (possibly empty) conjunctions of literals, and are (possibly empty)
disjunctions of concept names. For simplicity, we treat these conjunctions and disjunctions as sets.
The normalization phase transforms all GCIs to be of the form:

For a given TBox , the set of derivable consequences contains all GCIs of the form and . The saturation phase initializes to contain the axioms in the (normalized) TBox, and applies the rules from Table 1 until a saturated state is found.

1: | |||
---|---|---|---|

2: | |||

3: | |||

4: | |||

5: | |||

6: | |||

7: |

After termination, one can check that for every derivable consequence it holds that iff ; that is, this algorithm is correct for the property [SiKH-IJCAI11].

###### Example 2.

Recall the TBox from Example 1. Notice that all axioms in this TBox are already in normal form; hence the normalization step does not modify it. The consequence-based algorithm starts with and applies the rules until saturation. One possible execution of the algorithm is

where contains and the result of adding all the tautologies generated by the application of Rule over it (see Figure 2).

Since rule applications only extend the set of consequences, we depict exclusively the newly added consequence; e.g., the first rule application is in fact representing . When the execution of the method terminates, the set of consequences contains ; hence we can conclude that this subsumption follows from . Notice that other consequences (e.g., ) are also derived from the same execution.

For the rest of this paper, we consider an arbitrary, but fixed, consequence-based algorithm, that is correct for a given c-property .

## 4 The Pinpointing Extension

Our goal is to extend consequence-based algorithms from the previous section to methods that compute
pinpointing
formulae for their consequences. We achieve this by modifying the notion of states, and the rule applications
on them.
Recall that every axiom in in the class (in hence, also every axiom in the ontology ) is labelled
with a unique propositional variable
. In a similar manner, we consider sets of consequences that are labelled with
a monotone Boolean formula.
We use the notation to indicate that the elements in a the set are labelled in this way,
and use to express that
the consequence , labelled with the formula , belongs to .
A *pinpointing state* is a set of labelled consequences.
We assume that each consequence in this set is labelled with only one formula.
For a set of labelled consequences and a set of (unlabelled) consequences , we define
, where if
.

A consequence-based algorithm induces a pinpointing consequence-based algorithm by modifying the notion of rule application, and dealing with pinpointing states, instead of classical states, through a modification of the formulae labelling the derived consequences.

###### Definition 2 (Pinpointing Application).

The rule is *pinpointing applicable* to the pinpointing state if
.
The *pinpointing application* of this rule
modifies to:

The pinpointing state is *pinpointing saturated* if no rule is pinpointing applicable to it.

We denote as the fact that is obtained from the pinpointing application of the rule to . As before, we drop the subscript if the name of the rule is irrelevant and write simply . The pinpointing extension starts, as the classical one, with the set of all normalized axioms. For the rest of this section, we assume that the input ontology is already normalized, and hence each axiom in the initial pinpointing state is labelled with its corresponding propositional variable. In the next section we show how to deal with normalization.

###### Example 3.

Consider again the TBox from Example 1. At the beginning of the execution of the pinpointing algorithm, the set of consequences is the TBox, with each axiom labelled by the unique propositional variable representing it; that is . A pinpointing application of Rule 1 adds the new consequence , where the tautology labelling this consequence arises from the fact that rule 1 has no premises. At this point, one can pinpointing apply Rule 5 with

(see the solid arrow in Figure 2). In this case, and because the consequence does not belong to yet. Hence , and the rule is indeed pinpointing applicable. The pinpointing application of this rule adds the new labelled consequence to . Then, Rule 3 becomes pinpointing applicable with

which adds to the set of consequences. Then, Rule 7 over the set of premises

yields the new consequence .

Notice that, at this point Rule 6 is not applicable in the classical case over the set of premises because its (regular) application would add the consequence that was already derived. However,

hence, the rule is in fact pinpointing applicable. The pinpointing application of this Rule 6 substitutes the labelled consequence with the consequence . The pinpointing extension will then continue applying rules until a saturated state is reached. This execution is summarized in Figure 3.

At that point, the set of labelled consequences will contain, among others, . The label of this consequence corresponds to the pinpointing formula that was computed in Example 1.

Notice that if a rule is applicable to some state , then it is also pinpointing applicable to it. This holds because the regular applicability condition requires that at least one consequence in should not exist already in the state , which is equivalent to having the consequence . Indeed, we used this fact in the first pinpointing rule applications of Example 3. If the consequence-based algorithm is correct, then it follows by definition that for any saturated state obtained by a sequence of rule applications from , iff . Conversely, as shown next, every consequence created by a pinpointing rule application is also generated by a regular rule application. First, we extend the notion of a -projection to sets of consequences (i.e., states) in the obvious manner: .

###### Lemma 1.

Let be pinpointing states and let be a valuation. If then .

###### Proof.

We show that if then or , where is a rule. If does not satisfy then since the labels of the newly added assertions are not satisfied by , and the disjunction with does not change the evaluation of the modified labels under . On the other hand, if satisfies then . If then . Otherwise, again we have . ∎

Since all the labels are monotone Boolean formulae, it follows that the valuation that makes every propositional variable true satisfies all labels, and hence for every pinpointing state , . Lemma 1 hence entails that the pinpointing extension of the consequence-based algorithm does not create new consequences, but only labels these consequences. Termination of the pinpointing extension then follows from the termination of the consequence-based algorithm and the condition for pinpointing rule application that entails that, whenever a rule is pinpointing applied, the set of labelled consequences is necessarily modified either by adding a new consequence, or by modifying the label of at least one existing consequence to a weaker (i.e., more general) monotone Boolean formula. Since there are only finitely many monotone Boolean formulas over , every label can be changed finitely many times only.

It is in fact possible to get a better understanding of the running time of the pinpointing extension of a consequence-based algorithm. Suppose that, on input , the consequence-based algorithm stops after at most rule applications. Since every rule application must add at least one consequence to the state, the saturated state reached by this algorithm will have at most consequences. Consider now the pinpointing extension of . We know, from the previous discussion, that this pinpointing extension generates the same set of consequences. Moreover, since there are possible valuations over , and every pinpointing rule application that does not add a new consequence must generalize at least one formula, the labels of each consequence can be modified at most times. Overall, this means that the pinpointing extension of stops after at most rule applications. We now formalize this result.

###### Theorem 4.1.

If a consequence-based algorithm stops after at most rule applications, then stops after at most rule applications.

Another important property of the pinpointing extension is that saturatedness of a state is preserved under projections.

###### Lemma 2.

Let be a pinpointing state and a valuation. If is pinpointing saturated then is saturated.

###### Proof.

Suppose there is a rule such that is applicable to . This means that and . We show that is pinpointing applicable to . Since , satisfies . As , there is such that either or but does not satisfy . In the former case, is clearly pinpointing applicable to . In the latter, since satisfies but not . ∎

We can now show that the pinpointing extension of a consequence-based algorithm is indeed a pinpointing algorithm; that is, that when a saturated pinpointing state is reached from rule applications starting from , then for every , is a pinpointing formula for w.r.t. .

###### Theorem 4.2 (Correctness of Pinpointing).

Let be a c-property on axiomatized inputs for and . Given a correct consequence-based algorithm for , for every axiomatized input , where is normalized, then

if , , and is pinpointing saturated, then is a pinpointing formula for and .

###### Proof.

We want to show that is a pinpointing formula for and . That is, for every valuation : iff satisfies .

Assume that , i.e., , and let . Since terminates on every input, there is a saturated state such that . Completeness of then implies that . By assumption, and is pinpointing saturated. By Lemma 1 it follows that , and by Lemma 2, is saturated. Hence, since is correct, . This implies that because .

Conversely, suppose that satisfies . By assumption, , , and is saturated. By Lemma 1, . Since satisfies , . Then, by soundness of , . ∎

As it was the case for classical consequence-based algorithms, their pinpointing extensions can apply the rules in any desired order. The notion of correctness of consequence-based algorithms guarantees that a saturated state will always be found, and the result will be the same, regardless of the order in which the rules are applied. We have previously seen that termination transfers also the pinpointing extensions. Theorem 4.2 also shows that the formula associated to the consequences derived is always equivalent.

###### Corollary 1.

Let two pinpointing saturated states, an ontology, and a consequence such that and . If and , then .

To finalize this section, we consider again our running example of deciding subsumption in described in Section 3. It terminates after an exponential number of rule applications on the size of the input TBox . Notice that every pinpointing rule application requires an entailment test between two monotone Boolean formulas, which can be decided in non-deterministic polynomial time on . Thus, it follows from Theorem 4.1 that the pinpointing extension of the consequence-based algorithm for runs in exponential time.

###### Corollary 2.

Let be an TBox, and two concepts. A pinpointing formula for w.r.t. is computable in exponential time.

## 5 Dealing with Normalization

Throughout the last two sections, we have disregarded the first phase of the consequence-based algorithms
in which the axioms in the input ontology are transformed into a suitable normal form. In a nutshell, the
normalization phase takes every axiom in the ontology and substitutes it by a set of simpler axioms that are,
in combination, equivalent to the original one w.r.t. the set of derivable consequences. For example, in the
axiom is *not* in normal form. During the normalization phase, it would then
be substituted by the two axioms , , which in combination provide the exact
same constraints as the original axiom.

Obviously, in the context of pinpointing, we are interested in finding the set of *original* axioms that
cause the consequence of interest, and not those in normal form; in fact, normalization is an internal process
of the algorithm, and the user should be agnostic to the internal structures used. Hence, we need to find a way
to track the original axioms.

To solve this, we slightly modify the initialization of the pinpointing extension. Recall from the previous section that, if the input ontology is already in normal form, then we initialize the algorithm with the state that contains exactly that ontology, where every axiom is labelled with the unique propositional variable that represents it. If the ontology is not originally in normal form, then it is first normalized. In this case, we set as the initial state the newly normalized ontology, but every axiom is labelled with the disjunction of the variables representing the axioms that generated it. The following example explains this idea.

###### Example 4.

Consider a variant of the TBox from Example 1 that is now formed by the three axioms

Obviously, the first axiom is not in normal form, but can be normalized by substituting it with the two axioms , . Thus, the normalization step yields the same TBox from Example 1. However, instead of using different propositional variables to label these two axioms, they just inherit the label from the axiom that generated them; in this case . Thus, the pinpointing algorithm is initialized with

Following the same process as in Example 3, we see that we can derive the consequence . Hence is a pinpointing formula for w.r.t. . It can be easily verified that this is in fact the case.

Thus, the normalization phase does not affect the correctness, nor the complexity of the pinpointing extension of a consequence-based algorithm.

## 6 Conclusions

We presented a general framework to extend consequence-based algorithms with axiom pinpointing. These algorithms often enjoy optimal upper bound complexity and can be efficiently implemented in practice. Our focus in this paper and use case is for the prototypical ontology language . We emphasize that this is only one of many consequence-based algorithms available. The completion-based algorithm for [baader-ijcai05] is obtained by restricting the assertions to be of the form and with and , and adding one rule to handle role constructors. Other examples of consequence-based methods include LTUR approach for Horn clauses [minoux-ipl88], and methods for more expressive and Horn DLs [Kaza09, horrocks-kr16, kazakov-jar14].

Understanding the axiomatic causes for a consequence, and in particular the pinpointing formula, has importance
beyond MinA enumeration. For example, the pinpointing formula also encodes all the ways to *repair*
an ontology [arif-ki15]. Depending on the application in hand, a simpler version of the formula can be
computed, potentially more efficiently. This idea has already been employed to find good approximations
for MinAs [BaaSun-KRMED-08] and lean kernels [PMIM17] efficiently.

As future work, it would be interesting to investigate how algorithms for query answering in an ontology-based data access setting can be extended with the pinpointing technique. The pinpointing formula in this case could also be seen as a provenance polynomial, as introduced by Green et. al [Green07-provenance-seminal], in database theory. Another direction is to investigate axiom pinpointing in decision procedures for non-monotonic reasoning, where one would also expect the presence of negations in the pinpointing formula.

paper.bbl

Comments

There are no comments yet.