1 Introduction
A justification for a consequence refers to a minimal subset of the ontology, which still entails . The problem of computing justifications, also known as axiom pinpointing, has been widely studied in the context of description logics [PenaPP20]. Axiom pinpointing methods can be separated into two main classes, commonly known as blackbox and glassbox.
Blackbox approaches [kalyanpur2005debugging, kalyanpur2006debugging, parsia2005debugging] use existing reasoners as an oracle, and require no further modification of the reasoning method. Therefore, these approaches work for ontologies written in any monotonic logical language (including expressive DLs such as ), as long as a reasoner supporting it exists. In their most naïve form, blackbox methods check all possible subsets of the ontology for the desired entailment and compute the justifications from these results. In reality, many optimisations have been developed to reduce the number of calls needed, and avoid irrelevant work.
Glassbox approaches, on the other hand, modify the reasoning algorithm to output one or all justifications directly, from only one call. While the theory for developing glassbox methods has been developed for tableaux and automatabased reasoners [BaPeJLC10, BaPeJAR10, baader1995embedding, BaPSKI07], in practice not many of these methods have been implemented, as they require new implementation efforts and deactivating the optimisation techniques that make reasoners practical. A promising approach, first proposed in [SeVeCADE09] is to reduce, through a reasoning simulation, the axiom pinpointing problem to an enumeration problem from a propositional formula, and use stateoftheart SATsolving methods to enumerate all the justifications. This idea has led to effective axiom pinpointing systems developed primarily for the lightweight DL [PULi, beacon, EL2MCS, EL2MUS, SATpin].
The interest of axiom pinpointing goes beyond enumerating justifications. Modelling ontologies is a timeconsuming and fallible task. Indeed, during the modelling phase it is not uncommon to discover unexpected or wrong entailments. One way to fix these errors is to diagnose the causes by computing a hitting set of all the justifications. However, as there might exist exponential many justifications for a given entailment w.r.t. an ontology, even for
ontologies, finding all justifications is not feasible in general. One approach is to approximate the information by the union and intersection of all justifications. If the intersection is not empty, then any axiom in this intersection, when removed, guarantees that the consequence will not follow anymore. From the union, a knowledge engineer has a more precise view on the problematic instances, and can make a detailed analysis.
Although much work has focused on methods for computing one or all justifications efficiently, to the best of our knowledge there is little work on computing their intersection or union without enumerating them first, beyond the approximations presented in [conf/esws/PenalozaMIM17, journals/ki/Penaloza20]. In this paper, we propose an algorithm of computing the intersection of all justifications. This algorithm has the same worstcase behaviour as the blackbox algorithm of computing one justification. Additionally, we present two approaches of computing the union of all justifications, one is based on the blackbox algorithm of finding all justifications and the other approach uses the SATtool cmMUS.
The paper is structured as follows. In Section 2 we recall relevant definitions of description logics and propositional logic. Section 3 presents the algorithm for computing the intersection of all justifications without computing any single justification. We propose two methods of computing the union of all justifications in Section 4. We explain how to use the union and intersection of all justifications to repair ontologies in Section 5. Before concluding, an evaluation of our methods on realworld ontologies is presented in Section 6.
2 Justifications and Repairs in
We briefly recall the notions of justifications and repairs in . Let , and be mutually disjoint sets of concept names, role names, and individual names. The set of concepts is built through the following grammar rule
where and . An TBox is a finite set of general concept inclusions (GCIs) of the form and role inclusions , where and are concepts and . An ABox is a finite set of concept assertions of the form and role assertions , where , and . An ontology consists of an TBox and an ABox.
The semantics of this logic is defined in terms of interpretations. An interpretation is a pair where is a nonempty set called the domain, and is the interpretation function, which maps each concept name to a subset , each role name to a binary relation and each individual to a domain element . The interpretation function is extended to concepts as usual: , , , , , , and . The interpretation satisfies iff and it satisfies iff . We write if satisfies the axiom . The interpretation is a model of an ontology if satisfies all axioms in . An axiom is entailed by , denoted as , if for all models of . We use to denote the size of , i.e., the number of axioms in .
For this paper, we are interested in the notions of justification and repair.
Definition 1 (Justification, repair)
Let be an ontology and a GCI. A justification for is a subset such that and for any , . denotes the set of all justifications of w.r.t. . A repair for is a subontology such that , but for any . We denote the set of all repairs as .
Briefly, a justification is a minimal subset of an ontology that preserves the conclusion. Dually, a repair is a maximal subontology that does not preserve the consequence.
Now we consider a propositional language with a finite set of propositional variables . A literal is a variable or its negation . A clause is a disjunction of literals, denoted by [chang2014symbolic]. A Boolean formula in Conjunctive Normal Form (CNF) is a conjunction of clauses. A CNF formula is satisfiable iff there exists a truth assignment such that satisfies all clauses in . We can also consider a CNF formula as a set of clauses. A subformula is a Minimally Unsatisfiable Subformula (MUS) iff is unsatisfiable, but for every is satisfiable.
3 Computing the Intersection of all Justifications
We first study the problem of computing the intersection of all justifications, which we often call the core. Algorithm 1 provides a method for finding this core.
The algorithm is inspired by the known blackbox approach for finding justifications [KPHS07, BaPSKI07]. Starting from a justificationpreserving module (in this case, the localitybased module, Line 3), we try to remove one axiom (Line 4). If the removal of the axiom removes the entailment (Line 5), then must belong to all justifications ( is a sine qua non requirement for entailment within ), and is thus added to the core (Line 6).
Algorithm 2, on the other hand, generalises the known algorithm for computing a single justification, by considering a (fixed) set that is known to be contained in all justifications. If , the approach works as usual; otherwise, the algorithm avoids trying to remove any axiom from . This reduces the number of calls to the blackbox reasoner, potentially decreasing the overall execution time.
As mentioned already, the choice for a localitybased module in these algorithms is arbitrary, and any justificationpreserving module would suffice. In particular, we could compute lean kernel [conf/esws/PenalozaMIM17, conf/ijcai/KoopmannC20] for ontologies, and minimal subsumption modules [ChenLMWISWC17, conf/gcai/ChenL018] for ontologies instead, which is typically smaller thus reducing the number of iterations within the algorithms. However, as it could be quite expensive to compute such modules, it might not be worthwhile in some cases. The following theorem shows that Algorithm 1 correctly computes the intersection of all justifications.
Theorem 3.1
Let be an ontology and a GCI. Algorithm 1 computes the intersection of all justifications of w.r.t. .
Algorithm 1, like all blackbox methods for computing justifications, calls a standard reasoner times. In terms of computational complexity, computing the core requires as many computational resources as computing a single justification. However, computing one justification might be faster in practice, as the size of decreases throughout the execution of Algorithm 2. Clearly, if the core coincides with one justification , then is the only justification.
Corollary 1
Let be an ontology, a GCI; and let be the core and a justification for . If , is the only justification for .
4 Computing the Union of all Justifications
We now present two algorithms of computing the union of all justifications. The first algorithm follows a blackbox approach that calls a standard reasoner as oracle using the core of justifications. This is inspired by Reiter’s Hitting Set Tree algorithm [ReiterDiagnosis] and partially in line with [KPHS07, 10.1007/9783540897040_1]. For the second algorithm, we reduce the problem of computing the union of all justifications to the problem of computing the union of MUSes of a propositional formula. Note that the second algorithm works only for ontologies, while the first algorithm can be applied to ontologies with any expressivity, as long as a reasoner is available.
4.1 Blackbox algorithm
The blackbox algorithm of computing all justifications [10.1007/9783540897040_1] was inspired by the algorithm of computing all minimal hitting sets [ReiterDiagnosis]. Some of the improvements to prune the search space were already proposed in [ReiterDiagnosis]. Our method for computing the union of all justifications (Algorithm 3) works in a similar manner, but with a few key differences.
To avoid computing all justifications, we prune the search space when all remaining justifications are fully contained in the union computed so far (Lines 1112). In addition, we use the core to speed the search. As the axioms in the core must appear in every justification, we can reduce the number of calls made to the reasoner, and optimise the single justification computation (Line 17). Finally, when we organise our search space, we do not need to consider the axioms in the core (Line 22).
We now describe the UnionofAllJustifications procedure in detail. Given an ontology , a signature , and the intersection of all justifications of w.r.t. as input, a syntactic locality module of w.r.t. is extracted from (Lines 2). The justification search tree is a fourtuple , where is a finite set of nodes, is a set of edges, is an edge labelling function, mapping every edge to an axiom , and is the root node. We initialise the variable to represent a justification search tree for having only root node . Besides, the variables , containing the justifications that have been computed so far, and , containing the already explored nodes of , are both initialised with the empty set. The queue of nodes in that still has to be explored is also set to contain the node as its only element.
The algorithm then enters a loop (Lines 4–24) that runs while is not empty. The loop extracts the first element from and adds it to (Line 5). The axioms that label the edges of the path from to in are collected in the set (Line 7). After that, the algorithm checks whether is redundant. The detailed method for checking redundancy is described in Algorithm 4. The path is redundant iff there exists an explored node such that (a) the axioms in are exactly the axioms labelling the edges of the path from to in (Lines 4–6), or (b) is a leaf node of and the edges of are only labelled with axioms from (Lines 7–8). Case (a) corresponds to early path termination in [ReiterDiagnosis, KPHS07]: the existence of implies that all possible extensions of have already been considered. Case (b) implies that the axioms labelling the edges of lead to the fact that can not be entailed be the remaining TBox when removed from . Therefore, by monotonicity of , we infer that removing from also has the same consequence implying that we do not need to explore and all its extensions.
The current iteration can be terminated immediately if (Lines 9–10) as no subset of can be a justification of w.r.t. . In contrast to other blackbox algorithms for computing justifications, we additionally check whether is a subset of . If so, no new axioms belonging to the union of all justifications appear in this subtree. Hence, the algorithm does not need to explore it any further. Subsequently, the variable that will hold a justification of is initialised with . At this point we can check if a justification has already been computed for which (Lines 14–15) holds, in which casewe set to . This optimisation step can also be found in [ReiterDiagnosis, KPHS07] and it allows us to avoid a costly call to the SingleJustification procedure. Otherwise, in Line 17 we call SingleJustification on to obtain a justification of w.r.t. . We then check whether is equal to (Lines 18–19), in which case the search for additional justifications can be terminated (recall Corollary 1). Otherwise, the justification is added to in Line 20 and the union of all justifications is updated in Line 21. Finally, for every , the algorithm extends the tree in Lines 22–24 by adding a child to , connected by an edge labelled with . Note that it is sufficient to take as a set with cannot be a justification of w.r.t. . The procedure finishes by returning the set .
Note that this algorithm only adds justifications to . For completeness, one can show that the localitybased module of w.r.t. contains all the minimal modules of w.r.t. . Moreover, it is easy to see that the proposed optimisations do not lead to a minimal module not being computed. Overall, we obtain the following result.
Theorem 4.1
Let be an ontology, a GCI, and the core of w.r.t. . The procedure UnionofAllJustifications computes the union of all justifications of w.r.t. .
Algorithm 3 terminates on any input as the paths in the module search tree for that is constructed during the execution represent all the permutations of the axioms in that are relevant for finding all minimal modules. It is easy to see that the procedure UnionofAllJustifications runs in exponential time in size of (and polynomially in , , and ) in the worst case.
4.2 MUS Membership Problem
We now show how to compute the union of all justifications of a GCI by a membership approach. The idea is to check the membership of each axiom, i.e., whether it is a member of some justification. The main procedure is: firstly, as a preprocessing step, we compute a CNF formula using the consequencebased reasoner condor^{1}^{1}1We restrict to in this section as condor only accepts TBoxes. proposed in [condor]. Then, we compute the union of all justifications of by checking the membership for each axiom using the SATtool cmMUS [janota2011cmmus] and . In generally, the classification of an TBox is of exponential complexity. Since the MUSmembership problem is complete [liberatore2005redundancy], it follows that this method runs in exponential time.
Specifically, the method is divided in two steps:

Compute CNF formula . Let denote (possibly empty) conjunctions of concepts, and (possibly empty) disjunctions of concepts; condorclassifies the TBox through the inference rules in Table 1.
Table 1: Inference rules of condor Each inference rule can be rewritten as a clause. For example, the can be transferred to if we denote the as literals . Then the CNF formula is the conjunction of all the clauses corresponding to all the applied inference rules during the classification process. For details see [conf/esws/PenalozaMIM17, SeVeCADE09].

Check membership of each axiom using cmMUS. Given an CNF formula and a subformula , the algorithm cmMUS is used to determine whether there is a MUS such that . We say if there exists such MUS and otherwise. The membership is checked as follows:

Define a CNFformula , where each literal corresponds to an axiom , and , where is the given conclusion.

Define . Then is unsatisfiable; each MUS corresponds to a justification of ; and , iff belongs to some justifications of .

Note that only a small number of clauses in are related to the derivation of . In practice, (i) is the subformula contributing to the derivation of obtained by tracing back from , (ii) is the subformula including only that appears in . Using instead of as the input of algorithm cmMUS can significantly accelerate the cmMUS algorithm.
5 Repairing Ontologies
In this section we propose a notion of optimal repair and provide a method for computing all such optimal repairs.
Definition 2 (Optimal Repair)
Let be an ontology, a GCI, and the set of all repairs for . We say is an optimal repair for , if holds for every .
That is, an optimal repair is a repair such that removes the least amount of axioms from the original ontology. It is also important to recall the notion of a hitting set
Definition 3 (Hs)
We say is a minimal hitting set for a sets if for every .
We say is the smallest minimal hitting set if is the smallest among all minimal hitting set. The following proposition shows how we can compute the set of all optimal repairs through a hitting set computation [ScCoIJCAI03, LiSaSAT05, BaPeJLC10].
Proposition 1
Let be the set of all justifications for the GCI w.r.t. the ontology . If is the set of all smallest minimal hitting sets for , then is the set of all optimal repairs for .
When the core is not empty, a set that consists of only one axiom from the intersection of all justifications is a smallest hitting set for all justifications. We get the following corollary, stating how to compute all optimal repairs faster in this case, as a simple consequence of Proposition 1.
Corollary 2
Let be an ontology, a GCI and the core for . If , then is the set of all optimal repairs for .
The application of the union of all justifications can be used as a step towards deducing IAR entailments [journals/ki/Penaloza20].
6 Evaluation
To evaluate the performance of our algorithms in realworld ontologies, we built a prototypical implementation. The blackbox algorithm is implemented in Java and uses the OWL API [HorridgeBechhofer2011] to access ontologies and HermiT [GlimmHorrocks2014] as a standard reasoner. The MUSmembership algorithm (MUSMEM) is implemented in Python and calls cmMUS [janota2011cmmus] to detect whether a clause is a member of MUSes. The ontologies used in the evaluation come from the classification task at the ORE competition 2014 [ParsiaMatentzoglu2015]. Among them, we selected the ontologies that have less than 10,000 axioms, for a total of 95 ontologies. In the experiments, we computed a single justification, the intersection and union of all justifications for all atomic concept inclusions that are entailed by the ontologies.^{2}^{2}2An atomic concept inclusion is the inclusion that in the form of , where and are concept names. All experiments ran on two processors Intel® Xeon® E52609v2 2.5GHz, 8 cores, 64Go, Ubuntu 18.04. All the figures in this section plot the logarithmic computation time (in the vertical axis) of each test instance (in the horizontal axis).
min  0.001s  0.001s 

max  226.608s  341.560s 
mean  0.400s  0.456s 
median  0.009s  0.002s 
Computation time of the core vs. a single justification.
Fig. 2 compares the time to compute the core against computing a single justification. The instances in the horizontal axis are ordered according to the singlejustification computation time, represented by the black line. Orange dots represent the core computation time through Algorithm 1. Table 2 provides some basic statistics for comparison. Generally, computing the core is almost as fast as computing one justification as expected. Note that, in terms of computational complexity computing the core and one justification are equally hard problems, the size of the remaining ontology reduces during the latter process. Intuitively, if , checking whether a subsumption is satisfied by would be faster than checking it on .
Computation time of the union of all justifications.
As a benchmark, we use OWL API to compute all justifications and then get the union. As our second algorithm could compute the union of all justifications only for ontologies, we separate our ontologies into two categories: one is ontologies and the other one is the ontologies that are more expressive than . The computation time for the union of all justifications for ontologies is shown in Fig. 6 (the cases with several justifications) and Fig. 6 (the cases with only one justification). Figs. 6 and 6 show the computation time of the union of all justifications for more expressive ontologies when there exists multiple justifications and only one justification respectively. In Figs. 6–6, each blue, green or red dot corresponds to computation time of the union by OWL API, the blackbox algorithm or the MUSMEM algorithm for a conclusion respectively. We order the conclusions along the Xaxis by increasing order of computation time of MUSMEM algorithms in Figs. 6 and Fig. 6, and by the blackbox performance in the latter two figures. We observe from these plots that the blackbox algorithm outperforms other methods, and when available, MUSMEM tends to perform better than a direct use of the OWL API.
Size comparisons for justifications, cores, and unions of justifications.
Fig. 7 illustrates the ratio of the size of the core to the size of a random justification and to the size of the union of all justifications. In our experiments, the intersection of all justifications for only 2.35% subsumptions is empty, which means that we could use Corollary 2 to compute optimal repairs for 97.65% of the cases. Moreover, for more than 85% cases, the size of a justification () equals to the size of the core (), which indicates that there exists only one justification. When several justifications exist (the second chart from the left of Fig. 7), the ratio of to a random falls between 50% to 75% for almost half of the cases. The rightmost chart displays the distribution of the ratio of to the union of all justifications when there exist multiple justifications. The ratio distributes quite evenly between 0% (not including) to 75%. Interestingly, the intersection of all justifications is empty for only 16% subsumptions even when several justifications exist.
7 Conclusions
In this paper, we presented algorithms for computing the core (that is, the intersection of all justifications) and the union of all justifications for a given DL consequence. Most of the algorithms are based on repeated calls to a (blackbox) reasoner, and hence apply for ontologies and consequences of any expressivity, as long as a reasoner exists. The only exception is a MUSbased approach for computing the union of all justifications, which depends on the properties of the consequencebased method implemented by condor. Still, the approach should be generalisable without major problems to any language for which consequencebased reasoning methods exists like, for instance, [CuGH19, CuGH18].
As an application of our work, we study how to find optimal repairs effectively, through the information provided by the core and the union of all justifications. Through an empirical analysis, run over more than 100,000 consequences from almost a hundred ontologies from the ORE 2014 competition we observe that our methods behave better in practice than the usual approach through the OWL API. A more detailed analysis of the experimental results is left for future work.
Our experiments also confirm an observation that has already been made for lightweight ontologies [SuntPhD09], and to a smaller degree in the ontologies from the BioPortal corpus [BailPhD13]; namely, that consequences tend to have one, or only a few, overlapping justifications. In our case, we exploit this fact, and the efficient core computation algorithm to find optimal repairs in more than 97% of the test instances: those with exactly one justification, where removing any axioms from it leads to an optimal repair.
Comments
There are no comments yet.