# Union and Intersection of all Justifications

We present new algorithm for computing the union and intersection of all justifications for a given ontological consequence without first computing the set of all justifications. Through an empirical evaluation, we show that our approach works well in practice for expressive DLs. In particular, the union of all justifications can be computed much faster than with existing justification-enumeration approaches. We further discuss how to use these results to repair ontologies efficiently.

Comments

There are no comments yet.

## Authors

• 1 publication
• 8 publications
• 20 publications
• 5 publications
11/15/2021

### Simultaneous inference of correlated marginal tests using intersection-union or union-intersection test principle

Two main approaches in simultaneous inference are intersection-union tes...
11/12/2018

### Descriptive Unions. A Fibre Bundle Characterization of the Union of Descriptively Near Sets

This paper introduces an extension of descriptive intersection and provi...
09/03/2018

### IoU is not submodular

This short article aims at demonstrate that the Intersection over Union ...
06/11/2021

### Union and intersection contracts are hard, actually

Union and intersection types are a staple of gradually typed language su...
11/05/2020

### Towards a more perfect union type

We present a principled theoretical framework for inferring and checking...
10/20/2006

### The intersection and the union of the asynchronous systems

The asynchronous systems f are the models of the asynchronous circuits f...
11/05/2021

### Programming with union, intersection, and negation types

In this essay, I present the advantages and, I dare say, the beauty of p...
##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

A justification for a consequence refers to a minimal subset of the ontology, which still entails . The problem of computing justifications, also known as axiom pinpointing, has been widely studied in the context of description logics [Pena-PP20]. Axiom pinpointing methods can be separated into two main classes, commonly known as black-box and glass-box.

Black-box approaches [kalyanpur2005debugging, kalyanpur2006debugging, parsia2005debugging] use existing reasoners as an oracle, and require no further modification of the reasoning method. Therefore, these approaches work for ontologies written in any monotonic logical language (including expressive DLs such as ), as long as a reasoner supporting it exists. In their most naïve form, black-box methods check all possible subsets of the ontology for the desired entailment and compute the justifications from these results. In reality, many optimisations have been developed to reduce the number of calls needed, and avoid irrelevant work.

Glass-box approaches, on the other hand, modify the reasoning algorithm to output one or all justifications directly, from only one call. While the theory for developing glass-box methods has been developed for tableaux and automata-based reasoners [BaPe-JLC10, BaPe-JAR10, baader1995embedding, BaPS-KI07], in practice not many of these methods have been implemented, as they require new implementation efforts and deactivating the optimisation techniques that make reasoners practical. A promising approach, first proposed in [SeVe-CADE09] is to reduce, through a reasoning simulation, the axiom pinpointing problem to an enumeration problem from a propositional formula, and use state-of-the-art SAT-solving methods to enumerate all the justifications. This idea has led to effective axiom pinpointing systems developed primarily for the lightweight DL [PULi, beacon, EL2MCS, EL2MUS, SATpin].

The interest of axiom pinpointing goes beyond enumerating justifications. Modelling ontologies is a time-consuming and fallible task. Indeed, during the modelling phase it is not uncommon to discover unexpected or wrong entailments. One way to fix these errors is to diagnose the causes by computing a hitting set of all the justifications. However, as there might exist exponential many justifications for a given entailment w.r.t. an ontology, even for

-ontologies, finding all justifications is not feasible in general. One approach is to approximate the information by the union and intersection of all justifications. If the intersection is not empty, then any axiom in this intersection, when removed, guarantees that the consequence will not follow anymore. From the union, a knowledge engineer has a more precise view on the problematic instances, and can make a detailed analysis.

Although much work has focused on methods for computing one or all justifications efficiently, to the best of our knowledge there is little work on computing their intersection or union without enumerating them first, beyond the approximations presented in [conf/esws/PenalozaMIM17, journals/ki/Penaloza20]. In this paper, we propose an algorithm of computing the intersection of all justifications. This algorithm has the same worst-case behaviour as the black-box algorithm of computing one justification. Additionally, we present two approaches of computing the union of all justifications, one is based on the black-box algorithm of finding all justifications and the other approach uses the SAT-tool cmMUS.

The paper is structured as follows. In Section 2 we recall relevant definitions of description logics and propositional logic. Section 3 presents the algorithm for computing the intersection of all justifications without computing any single justification. We propose two methods of computing the union of all justifications in Section 4. We explain how to use the union and intersection of all justifications to repair ontologies in Section 5. Before concluding, an evaluation of our methods on real-world ontologies is presented in Section 6.

## 2 Justifications and Repairs in ALC

We briefly recall the notions of justifications and repairs in . Let , and be mutually disjoint sets of concept names, role names, and individual names. The set of -concepts is built through the following grammar rule

 C ::=⊤∣⊥∣A∣C⊓C∣C⊔C∣¬C∣∃r.C∣∀r.C,

where and . An -TBox is a finite set of general concept inclusions (GCIs) of the form and role inclusions , where and are -concepts and . An ABox is a finite set of concept assertions of the form and role assertions , where , and . An ontology consists of an TBox and an ABox.

The semantics of this logic is defined in terms of interpretations. An interpretation is a pair where is a non-empty set called the domain, and is the interpretation function, which maps each concept name to a subset , each role name to a binary relation and each individual to a domain element . The interpretation function is extended to -concepts as usual: , , , , , , and . The interpretation satisfies iff and it satisfies iff . We write if satisfies the axiom . The interpretation is a model of an ontology if satisfies all axioms in . An axiom is entailed by , denoted as , if for all models of . We use to denote the size of , i.e., the number of axioms in .

For this paper, we are interested in the notions of justification and repair.

###### Definition 1 (Justification, repair)

Let be an ontology and a GCI. A justification for is a subset such that and for any , . denotes the set of all justifications of w.r.t. . A repair for is a subontology such that , but for any . We denote the set of all repairs as .

Briefly, a justification is a minimal subset of an ontology that preserves the conclusion. Dually, a repair is a maximal sub-ontology that does not preserve the consequence.

Now we consider a propositional language with a finite set of propositional variables . A literal is a variable or its negation . A clause is a disjunction of literals, denoted by [chang2014symbolic]. A Boolean formula in Conjunctive Normal Form (CNF) is a conjunction of clauses. A CNF formula is satisfiable iff there exists a truth assignment such that satisfies all clauses in . We can also consider a CNF formula as a set of clauses. A subformula is a Minimally Unsatisfiable Subformula (MUS) iff is unsatisfiable, but for every is satisfiable.

## 3 Computing the Intersection of all Justifications

We first study the problem of computing the intersection of all justifications, which we often call the core. Algorithm 1 provides a method for finding this core.

The algorithm is inspired by the known black-box approach for finding justifications [KPHS07, BaPS-KI07]. Starting from a justification-preserving module (in this case, the locality-based module, Line 3), we try to remove one axiom (Line 4). If the removal of the axiom removes the entailment (Line 5), then must belong to all justifications ( is a sine qua non requirement for entailment within ), and is thus added to the core (Line 6).

Algorithm 2, on the other hand, generalises the known algorithm for computing a single justification, by considering a (fixed) set that is known to be contained in all justifications. If , the approach works as usual; otherwise, the algorithm avoids trying to remove any axiom from . This reduces the number of calls to the black-box reasoner, potentially decreasing the overall execution time.

As mentioned already, the choice for a locality-based module in these algorithms is arbitrary, and any justification-preserving module would suffice. In particular, we could compute lean kernel [conf/esws/PenalozaMIM17, conf/ijcai/KoopmannC20] for -ontologies, and minimal subsumption modules [ChenLMW-ISWC17, conf/gcai/ChenL018] for -ontologies instead, which is typically smaller thus reducing the number of iterations within the algorithms. However, as it could be quite expensive to compute such modules, it might not be worthwhile in some cases. The following theorem shows that Algorithm 1 correctly computes the intersection of all justifications.

###### Theorem 3.1

Let be an ontology and a GCI. Algorithm 1 computes the intersection of all justifications of  w.r.t. .

Algorithm 1, like all black-box methods for computing justifications, calls a standard reasoner times. In terms of computational complexity, computing the core requires as many computational resources as computing a single justification. However, computing one justification might be faster in practice, as the size of decreases throughout the execution of Algorithm 2. Clearly, if the core coincides with one justification , then is the only justification.

###### Corollary 1

Let be an ontology, a GCI; and let be the core and a justification for . If , is the only justification for .

## 4 Computing the Union of all Justifications

We now present two algorithms of computing the union of all justifications. The first algorithm follows a black-box approach that calls a standard reasoner as oracle using the core of justifications. This is inspired by Reiter’s Hitting Set Tree algorithm [ReiterDiagnosis] and partially in line with [KPHS07, 10.1007/978-3-540-89704-0_1]. For the second algorithm, we reduce the problem of computing the union of all justifications to the problem of computing the union of MUSes of a propositional formula. Note that the second algorithm works only for -ontologies, while the first algorithm can be applied to ontologies with any expressivity, as long as a reasoner is available.

### 4.1 Black-box algorithm

The black-box algorithm of computing all justifications [10.1007/978-3-540-89704-0_1] was inspired by the algorithm of computing all minimal hitting sets [ReiterDiagnosis]. Some of the improvements to prune the search space were already proposed in [ReiterDiagnosis]. Our method for computing the union of all justifications (Algorithm 3) works in a similar manner, but with a few key differences.

To avoid computing all justifications, we prune the search space when all remaining justifications are fully contained in the union computed so far (Lines 11-12). In addition, we use the core to speed the search. As the axioms in the core must appear in every justification, we can reduce the number of calls made to the reasoner, and optimise the single justification computation (Line 17). Finally, when we organise our search space, we do not need to consider the axioms in the core (Line 22).

We now describe the Union-of-All-Justifications procedure in detail. Given an ontology , a signature , and the intersection of all justifications of  w.r.t.  as input, a syntactic -locality module  of  w.r.t.  is extracted from  (Lines 2). The justification search tree is a four-tuple , where is a finite set of nodes, is a set of edges, is an edge labelling function, mapping every edge to an axiom , and is the root node. We initialise the variable  to represent a justification search tree for  having only root node . Besides, the variables , containing the justifications that have been computed so far, and , containing the already explored nodes of , are both initialised with the empty set. The queue  of nodes in  that still has to be explored is also set to contain the node  as its only element.

The algorithm then enters a loop (Lines 4–24) that runs while  is not empty. The loop extracts the first element  from  and adds it to  (Line 5). The axioms that label the edges of the path  from  to  in  are collected in the set (Line 7). After that, the algorithm checks whether is redundant. The detailed method for checking redundancy is described in Algorithm 4. The path is redundant iff there exists an explored node such that (a) the axioms in are exactly the axioms labelling the edges of the path  from  to  in  (Lines 4–6), or (b)  is a leaf node of  and the edges of  are only labelled with axioms from  (Lines 7–8). Case (a) corresponds to early path termination in [ReiterDiagnosis, KPHS07]: the existence of  implies that all possible extensions of  have already been considered. Case (b) implies that the axioms labelling the edges of  lead to the fact that can not be entailed be the remaining TBox when removed from . Therefore, by monotonicity of , we infer that removing  from  also has the same consequence implying that we do not need to explore and all its extensions.

The current iteration can be terminated immediately if (Lines 9–10) as no subset of  can be a justification of  w.r.t. . In contrast to other black-box algorithms for computing justifications, we additionally check whether is a subset of . If so, no new axioms belonging to the union of all justifications appear in this sub-tree. Hence, the algorithm does not need to explore it any further. Subsequently, the variable  that will hold a justification of  is initialised with . At this point we can check if a justification  has already been computed for which (Lines 14–15) holds, in which casewe set  to . This optimisation step can also be found in [ReiterDiagnosis, KPHS07] and it allows us to avoid a costly call to the Single-Justification procedure. Otherwise, in Line 17 we call Single-Justification on  to obtain a justification of w.r.t. . We then check whether is equal to (Lines 18–19), in which case the search for additional justifications can be terminated (recall Corollary 1). Otherwise, the justification  is added to  in Line 20 and the union of all justifications is updated in Line 21. Finally, for every , the algorithm extends the tree in Lines 22–24 by adding a child to , connected by an edge labelled with . Note that it is sufficient to take as a set  with cannot be a justification of  w.r.t. . The procedure finishes by returning the set .

Note that this algorithm only adds justifications to . For completeness, one can show that the locality-based module  of  w.r.t.  contains all the minimal modules of  w.r.t. . Moreover, it is easy to see that the proposed optimisations do not lead to a minimal module not being computed. Overall, we obtain the following result.

###### Theorem 4.1

Let be an ontology, a GCI, and the core of w.r.t. . The procedure Union-of-All-Justifications computes the union of all justifications of  w.r.t. .

Algorithm 3 terminates on any input as the paths in the module search tree  for  that is constructed during the execution represent all the permutations of the axioms in  that are relevant for finding all minimal modules. It is easy to see that the procedure Union-of-All-Justifications runs in exponential time in size of  (and polynomially in , , and ) in the worst case.

### 4.2 MUS Membership Problem

We now show how to compute the union of all justifications of a GCI by a membership approach. The idea is to check the membership of each axiom, i.e., whether it is a member of some justification. The main procedure is: firstly, as a pre-processing step, we compute a CNF formula using the consequence-based reasoner condor111We restrict to in this section as condor only accepts -TBoxes. proposed in [condor]. Then, we compute the union of all justifications of by checking the membership for each axiom using the SAT-tool cmMUS [janota2011cmmus] and . In generally, the classification of an -TBox is of exponential complexity. Since the MUS-membership problem is -complete [liberatore2005redundancy], it follows that this method runs in exponential time.

Specifically, the method is divided in two steps:

1. Compute CNF formula . Let denote (possibly empty) conjunctions of concepts, and (possibly empty) disjunctions of concepts; condorclassifies the TBox through the inference rules in Table 1.

Each inference rule can be rewritten as a clause. For example, the can be transferred to if we denote the as literals . Then the CNF formula is the conjunction of all the clauses corresponding to all the applied inference rules during the classification process. For details see [conf/esws/PenalozaMIM17, SeVe-CADE09].

2. Check membership of each axiom using cmMUS. Given an CNF formula and a subformula , the algorithm cmMUS is used to determine whether there is a MUS such that . We say if there exists such MUS and otherwise. The membership is checked as follows:

1. Define a CNF-formula , where each literal corresponds to an axiom , and , where is the given conclusion.

2. Define . Then is unsatisfiable; each MUS corresponds to a justification of ; and , iff belongs to some justifications of .

Note that only a small number of clauses in are related to the derivation of . In practice, (i) is the subformula contributing to the derivation of obtained by tracing back from , (ii) is the subformula including only that appears in . Using instead of as the input of algorithm cmMUS can significantly accelerate the cmMUS algorithm.

## 5 Repairing Ontologies

In this section we propose a notion of optimal repair and provide a method for computing all such optimal repairs.

###### Definition 2 (Optimal Repair)

Let be an ontology, a GCI, and the set of all repairs for . We say is an optimal repair for , if holds for every .

That is, an optimal repair is a repair such that removes the least amount of axioms from the original ontology. It is also important to recall the notion of a hitting set

###### Definition 3 (Hs)

We say is a minimal hitting set for a sets if for every .

We say is the smallest minimal hitting set if is the smallest among all minimal hitting set. The following proposition shows how we can compute the set of all optimal repairs through a hitting set computation [ScCo-IJCAI03, LiSa-SAT05, BaPe-JLC10].

###### Proposition 1

Let be the set of all justifications for the GCI w.r.t. the ontology . If is the set of all smallest minimal hitting sets for , then is the set of all optimal repairs for .

When the core is not empty, a set that consists of only one axiom from the intersection of all justifications is a smallest hitting set for all justifications. We get the following corollary, stating how to compute all optimal repairs faster in this case, as a simple consequence of Proposition 1.

###### Corollary 2

Let be an ontology, a GCI and the core for . If , then is the set of all optimal repairs for .

The application of the union of all justifications can be used as a step towards deducing IAR entailments [journals/ki/Penaloza20].

## 6 Evaluation

To evaluate the performance of our algorithms in real-world ontologies, we built a prototypical implementation. The black-box algorithm is implemented in Java and uses the OWL API [HorridgeBechhofer2011] to access ontologies and HermiT [GlimmHorrocks2014] as a standard reasoner. The MUS-membership algorithm (MUS-MEM) is implemented in Python and calls cmMUS [janota2011cmmus] to detect whether a clause is a member of MUSes. The ontologies used in the evaluation come from the classification task at the ORE competition 2014 [ParsiaMatentzoglu2015]. Among them, we selected the ontologies that have less than 10,000 axioms, for a total of 95 ontologies. In the experiments, we computed a single justification, the intersection and union of all justifications for all atomic concept inclusions that are entailed by the ontologies.222An atomic concept inclusion is the inclusion that in the form of , where and are concept names. All experiments ran on two processors Intel® Xeon® E5-2609v2 2.5GHz, 8 cores, 64Go, Ubuntu 18.04. All the figures in this section plot the logarithmic computation time (in the vertical axis) of each test instance (in the horizontal axis).

#### Computation time of the core vs. a single justification.

Fig. 2 compares the time to compute the core against computing a single justification. The instances in the horizontal axis are ordered according to the single-justification computation time, represented by the black line. Orange dots represent the core computation time through Algorithm 1. Table 2 provides some basic statistics for comparison. Generally, computing the core is almost as fast as computing one justification as expected. Note that, in terms of computational complexity computing the core and one justification are equally hard problems, the size of the remaining ontology reduces during the latter process. Intuitively, if , checking whether a subsumption is satisfied by would be faster than checking it on .

#### Computation time of the union of all justifications.

As a benchmark, we use OWL API to compute all justifications and then get the union. As our second algorithm could compute the union of all justifications only for -ontologies, we separate our ontologies into two categories: one is -ontologies and the other one is the ontologies that are more expressive than . The computation time for the union of all justifications for -ontologies is shown in Fig. 6 (the cases with several justifications) and Fig. 6 (the cases with only one justification). Figs. 6 and 6 show the computation time of the union of all justifications for more expressive ontologies when there exists multiple justifications and only one justification respectively. In Figs. 66, each blue, green or red dot corresponds to computation time of the union by OWL API, the black-box algorithm or the MUS-MEM algorithm for a conclusion respectively. We order the conclusions along the X-axis by increasing order of computation time of MUS-MEM algorithms in Figs. 6 and Fig. 6, and by the black-box performance in the latter two figures. We observe from these plots that the black-box algorithm outperforms other methods, and when available, MUS-MEM tends to perform better than a direct use of the OWL API.

#### Size comparisons for justifications, cores, and unions of justifications.

Fig. 7 illustrates the ratio of the size of the core to the size of a random justification and to the size of the union of all justifications. In our experiments, the intersection of all justifications for only 2.35% subsumptions is empty, which means that we could use Corollary 2 to compute optimal repairs for 97.65% of the cases. Moreover, for more than 85% cases, the size of a justification () equals to the size of the core (), which indicates that there exists only one justification. When several justifications exist (the second chart from the left of Fig. 7), the ratio of to a random falls between 50% to 75% for almost half of the cases. The right-most chart displays the distribution of the ratio of to the union of all justifications when there exist multiple justifications. The ratio distributes quite evenly between 0% (not including) to 75%. Interestingly, the intersection of all justifications is empty for only 16% subsumptions even when several justifications exist.

## 7 Conclusions

In this paper, we presented algorithms for computing the core (that is, the intersection of all justifications) and the union of all justifications for a given DL consequence. Most of the algorithms are based on repeated calls to a (black-box) reasoner, and hence apply for ontologies and consequences of any expressivity, as long as a reasoner exists. The only exception is a MUS-based approach for computing the union of all justifications, which depends on the properties of the consequence-based method implemented by condor. Still, the approach should be generalisable without major problems to any language for which consequence-based reasoning methods exists like, for instance, [CuGH19, CuGH18].

As an application of our work, we study how to find optimal repairs effectively, through the information provided by the core and the union of all justifications. Through an empirical analysis, run over more than 100,000 consequences from almost a hundred ontologies from the ORE 2014 competition we observe that our methods behave better in practice than the usual approach through the OWL API. A more detailed analysis of the experimental results is left for future work.

Our experiments also confirm an observation that has already been made for light-weight ontologies [Sunt-PhD09], and to a smaller degree in the ontologies from the BioPortal corpus [Bail-PhD13]; namely, that consequences tend to have one, or only a few, overlapping justifications. In our case, we exploit this fact, and the efficient core computation algorithm to find optimal repairs in more than 97% of the test instances: those with exactly one justification, where removing any axioms from it leads to an optimal repair.