DeepAI
Log In Sign Up

Region-Based Merging of Open-Domain Terminological Knowledge

05/05/2022
by   Zied Bouraoui, et al.
National Institute of Advanced Industrial Science and Technology
cril.fr
0

This paper introduces a novel method for merging open-domain terminological knowledge. It takes advantage of the Region Connection Calculus (RCC5), a formalism used to represent regions in a topological space and to reason about their set-theoretic relationships. To this end, we first propose a faithful translation of terminological knowledge provided by several and potentially conflicting sources into region spaces. The merging is then performed on these spaces, and the result is translated back into the underlying language of the input sources. Our approach allows us to benefit from the expressivity and the flexibility of RCC5 while dealing with conflicting knowledge in a principled way.

READ FULL TEXT VIEW PDF

page 1

page 2

page 3

page 4

03/07/2000

On the semantics of merging

Intelligent agents are often faced with the problem of trying to merge p...
03/15/2012

Merging Knowledge Bases in Possibilistic Logic by Lexicographic Aggregation

Belief merging is an important but difficult problem in Artificial Intel...
05/18/2015

Spatial database implementation of fuzzy region connection calculus for analysing the relationship of diseases

Analyzing huge amounts of spatial data plays an important role in many e...
01/16/2013

A Principled Analysis of Merging Operations in Possibilistic Logic

Possibilistic logic offers a qualitative framework for representing piec...
01/07/2021

Merging with unknown reliability

Merging beliefs depends on the relative reliability of their sources. Wh...
01/10/2013

A Forgetting-based Approach to Merging Knowledge Bases

This paper presents a novel approach based on variable forgetting, which...
07/28/2017

Human in the Loop: Interactive Passive Automata Learning via Evidence-Driven State-Merging Algorithms

We present an interactive version of an evidence-driven state-merging (E...

1 Introduction

Commonsense knowledge is playing an increasingly important role in the development of AI systems. Such knowledge is available, for example, in large open-domain terminological knowledge bases Cyc or SUMO as ontological knowledge, in knowledge graphs (KGs) such as DBpedia and WikiData, and as semantic markup (e.g., RDFa). Ontologies, as frameworks for expressing terminological knowledge, encode structured relations about the concepts and properties of a given domain. In this paper, we focus on terminological knowledge about concepts that can be expressed using ontologies as these latter are playing an important role in areas such as semantic web

[Homburg2020], information retrieval [Zheng2019RLTMAE]

, natural language processing

[Marco2018]

, and machine learning

[Hohenecker2020], among others. For instance, Bouraoui and Schockaert (Bouraoui2018LearningCS) have shown that knowledge encoded in ontologies, as prior conceptual knowledge, is useful for learning concept representations from few examples.

However, the available ontologies (and KGs, as simple ontologies) are inevitably incomplete, where several rules and facts are missing. Several methods have been proposed for automated ontology (KG) completion [beltagy2013montague, DBLP:conf/nips/Rocktaschel017, Li2019] that exploit statistical regularities in a given ontology to predict plausible missing rules or facts. Unfortunately, meaningful knowledge is difficult to predict, especially when only a few examples of facts or rules are available. Moreover, as most of the existing approaches are mainly based on statistical regularities, the resulting predictions might be conflicting with each other. A repair-based mechanism is then required to maintain the consistency of the set of terminological statements, i.e., ensure that there are no conflicting (or contradictory) statements. In the same perspective, to widen the coverage of terminological knowledge to several domains and to deal with incompleteness and conflicting statements, one may combine knowledge from several sources. However, it turns out that merging open-domain knowledge bases is a particularly challenging task as pointed out, for example, in [DBLP:conf/www/TanonVSSP16] reporting the different problems and difficulties encountered when merging Freebase with WikiData. Conflicting information may occur when the statements of several sources are simply gathered together. Ontology merging and alignment / matching has also attracted much attention in the literature [Thiblin2018CANARDCM, Chang2019, Benferhat2019, Zhao2016FCAMapRF, Laadhar2017POMapRF, Thiblin2018CANARDCM]. The approach generally consists in studying an equivalence matching between a source and target taxonomy. In this work, we assume that all knowledge encoded by the different sources are already aligned and mapped to each other. Namely, we suppose that they share the same terminology. Hence, while ontology alignment (or matching) is the process of determining correspondences between terminologies of ontologies, ontology merging aims to combine two (or more) ontologies having the same terminology while handling conflicting statements.

Let us consider an example to illustrate the merging problem. Assume that a first source says that the concept is disjoint with the concept , while another source says that every is a . Obviously enough, these two statements are conflicting. To be faithful to both sources while resolving conflicts, a sensible choice would be to assume that and are not disjoint concepts, but every is not necessarily a , that is, the two concepts partially overlap. This kind of result is clearly consistent and can be seen as a good compromise between both sources. Finding a meaningful and relevant compromise between sources during the merging process is difficult to obtain. This is mainly due the fact that ontology languages (Description Logics for example) are not expressive enough to capture salient knowledge that might emerge during the merging process. The simple example pointed out above clearly shows some pieces of knowledge that should be taken into account during the merging process, but that cannot be captured in the ontology language. The problem of ontology (or DL) merging is close to the problem of belief merging in a propositional setting [Benferhat2019, Benferhat2014, Sri2016, Wang2012, EKM08a]. For instance, Benferhat et al. (Benferhat2019) studied merging assertional bases in the DL-Lite fragment. They have determined the minimal subsets of assertions to resolve conflicts based on the inconsistency minimization principle. Bouraoui et al. (Bouraoui2020) proposed a model-based merging operator for merging ontologies which solves semantic conflicts that arise during the merging process.

In this paper, to handle the merging problem, we take inspiration from conceptual spaces, which are geometric representation frameworks, in which the objects are represented as points in a topological space, and concepts are modelled as regions [Haarslev97, Gardenfors:conceptualSpaces, Douven2022]. Motivated by the fact that conceptual knowledge in an ontology can be to some extent modelled as geometric objects and constraints on metric spaces [Bouraoui2020ModellingSC], this paper proposes a method for ontology merging that takes advantage of qualitative spatial reasoning to find out a relevant compromise between sources while resolving conflicts. Qualitative spatial reasoning is a suitable paradigm for efficiently reasoning about spatial entities and their relationships, where knowledge is represented as a so-called qualitative constraint network (QCN). Spatial information is usually represented in terms of basic or non-basic relations in a qualitative calculus, where reasoning tasks are then formulated as solving a set of qualitative constraints [Randell92, Cohn1997, Bhatt2011, Sioutis2020]. In particular, the Region Connection Calculus (RCC) is a well-studied formalism for qualitative topological representation and reasoning, including its subsets RCC-5 [Schockaert2013] and [Randell92]. Two significant advantages of the RCC framework are its ability to reason efficiently about the relationships between spatial entities, and its ability to deal with conflicts in qualitative constraint merging as shown in [Condotta2010, Thau2009]. In short, the representation of region constraints into QCNs allows for more expressivity than when using DL rules (or constraints). In particular, QCNs are expressive enough to allows for disjunctions in the constraints.

Several QCN merging operators have been introduced in the literature (e.g., [Condotta2009, Condotta2010, Thau2009, Hue2012]. Roughly speaking, these operators compute a distance between QCN scenarios and the input QCNs. Then the scenarios with a minimal distance are selected as the best candidates for the merged result. Taking inspiration from these works, in this paper we take advantage of the RCC-5 formalism and propose a method for merging open-domain terminological knowledge (simply called ontologies) using QCNs. We first show how to translate such knowledge into qualitative spaces while preserving its semantics and properties. Second, we propose a merging operator that produces a single and consistent region space representing a compromise between sources. Lastly, we show how to express the region space in the input ontology language while maintaining all relevant information.

The proofs of propositions are available online111https://arxiv.org/abs/2205.02660.

2 Background

Our method is based on two complementary frameworks for merging open-domain terminological knowledge: we rely on a lightweight Description Logic (DL) framework to encode knowledge and use RCC-5 and qualitative constraints for performing the merging. This section briefly recalls the technical background required on these two topics.

Description Logics.

is a family of lightweight DLs, which underlies the Ontology Web Language profile OWL2-EL, that is considered as one of the main representation formalisms to express terminological knowledge [Baader2005]. The main ingredients of DLs are individuals, concepts, and roles, which correspond at the semantic level to objects, sets of objects, and binary relations between objects. More formally, let , , be three pairwise disjoint sets where denotes a set of atomic concepts, denotes a set of atomic relations (roles), and denotes a set of individuals. In this paper, we consider concept expressions [KRIEGEL2020172] which are built according to the following grammar:

Let , , and . An ontology (a.k.a. knowledge base) comprises two components, the TBox (Terminological Box denoted by ) and ABox (denoted by ). The TBox consists of a set of General Concept Inclusion (GCI) axioms of the form , meaning that is more specific than or simply is subsumed by , and axioms of the form , meaning that and are disjoint concepts. The ABox is a finite set of assertions on individual objects of the form or .

The semantics is given in terms of interpretations , which consist of a non-empty interpretation domain and an interpretation function that maps each individual into an element , each concept into a subset , and each role into a subset .

Syntax Semantics
Table 1: Syntax and semantics of

A summary of the syntax and semantics of is shown in Table 1. An interpretation is said to be a model of (or satisfies) an axiom in the form of the left column in the table, denoted by , when the corresponding condition in the right column is satisfied. For instance, if and only if . Similarly, satisfies a concept (resp. role) assertion, denoted by (resp. ), if (resp. ). An interpretation is a model of an ontology if it satisfies all the axioms and assertions in . An ontology is said to be consistent if it has a model. Otherwise, it is inconsistent. An axiom is entailed by an ontology, denoted by , if is satisfied by every model of . We say that is subsumed by w.r.t. an ontology iff . Similarly, we say that is an instance of w.r.t.  iff . An interpretation is said to be fulfilling when each concept name in the ontology is non-empty in , i.e., for each concept .

The main reasoning task that is considered in terminological ontologies is classification. It consists in computing all the entailed subsumptions () (and equivalences ()) that hold between atomic concepts of an ontology, or the concepts or . Such a procedure is described in [Baader2005], which first consists in transforming the ontology into a normal form using a set of rules, and then performing a classification reasoning process using the set of inference (completion) rules (see [Baader2005] for more details). In this paper, we assume that the input ontologies are provided in a specific normal form, to which we apply completion rules for classification. This classification step is to normalize the input ontologies. The reason of conducting normalization is to handle and transform the complex axioms into the axioms of all atomic concepts to be simpler for the translation process. That is, before translating the input ontologies into a region-based representation, we assume that each of them is in the strict normal form, i.e., if its TBox only consists of inclusions of the fundamental form: , , and where . Such an assumption is made without loss of generality, since for each ontology , one can compute an ontology in the strict normal form in polynomial time [Baader2005].

Region Connection and Qualitative Constraints.

The RCC (Region Connection Calculus) formalism allows one to represent and reason about the relationships between spatial entities [Randell92]. Among the fragments of the RCC theory, RCC-5 fragment is expressive enough to reason about set-theoretic relations between regions [Bennett94, Jonsson1997ACC]. In RCC-5, regions can simply be interpreted as non-empty subsets of a given set and the focus is given on a set of five binary relations between regions called basic relations. The set forms a jointly exhaustive and pairwise disjoint set of relations, that is, each pair of regions satisfies exactly one relation from : the relation (resp. , , ) holds between two regions whenever the two regions are disjoint (resp. when they partially overlap, are equal, when the first is a strict subset of the second), and is the converse of . Based on , more complex pieces of information about the relative positions of a set of regions can be represented by means of qualitative constraint networks (QCNs). Formally, a QCN is a pair , where is a set of region variables representing the spatial entities and is a set of binary constraints between these entities. Each constraint is a mapping from to , and is simply denoted by , where ; and is said to be a singleton constraint whenever is a singleton.

An interpretation of a QCN is defined as , where is a non-empty set (the domain of the regions), and is an interpretation function which maps each variable to a non-empty subset of . Table 2 precises how singleton constraints from are interpreted in RCC-5, i.e., an interpretation of satisfies a singleton constraint , denoted by , if the relation between and according to the table is satisfied (e.g., whenever ). The satisfaction relation is extended to any (non-singleton) constraint from as follows: for each , iff for some (e.g., iff or ). An interpretation of a QCN is said to be a solution of , denoted by , iff for each . A QCN is consistent iff it admits a solution. A sub-network of is a QCN such that . A quasi-atomic QCN is a QCN where for each , there is a unique constraint , and where is either a singleton or . A scenario of a QCN is a quasi-atomic sub-network of . Noteworthy, a QCN is consistent if it admits a consistent scenario.

Name (Symbol) Syntax Semantics
Proper Part of (PP )
Inverse PP of (PPi)
Equals (EQ)
Disjoint From (DR)
Partially Overlaps
(PO)
Table 2: Syntax and semantics of RCC-5, .

3 Merging Framework Description

This short section summarizes our method for merging open-domain terminological knowledge (i.e., ontologies) using qualitative constraint networks (QCNs). As highlighted in the introduction, QCNs are expressive enough to capture some relevant information that might emerge during the merging process, which in turn allows one to select a consistent compromise between sources when expressing the merging result.

The first task is to find a “faithful translation” from an ontology to a QCN that preserves the semantics and maintains the initial knowledge encoded in the ontology. Then, it is assumed that every input ontology is translated into a QCN, so we are given a multiset of QCNs, also called profile. The second task is then to define a merging operator that associates the QCN profile into a single QCN, representing the QCN profile in a global and consistent way. As constraints of the merged QCN are sets of basic relations, this QCN cannot be translated back into the target ontology language in the general case. So we select one of its “best” consistent scenarios which, in contrast, can be expressed as an ontology.

More precisely, our approach involves the following main steps:

  1. The translation of terminological knowledge sources (i.e., the input ontologies given in the strict normal form) into QCNs (Section 4). In this step, we present a translation function that ensures the faithfulness of the translation from an ontology into a QCN.

  2. The definition of a QCN merging operator (Section 5). Exploiting the notion of “distance” between basic relations and constraints, this step associates the input QCN profile (the translated ontologies) with a single merged, consistent QCN.

  3. The selection of a “best” consistent scenario of the merged QCN as a candidate of the merged result (Section 6). This selection process takes advantage of the notion of distance between scenarios and some plausible instantiation of the input ontologies (i.e., some interpretations).

  4. The translation of the selected consistent scenario back into an ontology, i.e., the underlying language of the input sources (Section 7).

4 Translating Terminological Knowledge into QCNs

In this section, we present a translation function from any ontology to a QCN. More precisely, we (1) map atomic concepts names into QCN variables and axioms into constraints, and (2) show that the translation is faithful to the TBox of the initial ontology.

Definition 1 (Forward translation ).

A forward translation is a function s.t. . is extended to map axioms in the (strict) normal form into constraints as follows:

  • , and

  • .

Moreover, is extended to translate ontologies in the (strict) normal form into a set of constraints in the expected way: .

To show that the translation is faithful, we provide a “semantic” mapping from to , and conversely. Let us first show how models of correspond to solutions of :

Definition 2 (Flattening of an interpretation).

Let be a fulfilling interpretation. With we denote the flattening of , where is such that .

Notably, non-fulfilling interpretations are not considered (are irrelevant) in the paper because the corresponding translation regions cannot be empty (are non-empty subsets) in RCC-5.

Theorem 1.

Let be an ontology, and let  be a fulfilling interpretation of such that . Then .

The other way around, let us show how solutions of correspond to interpretations satisfying all axioms from .

Definition 3 (Inflation of a solution).

Let be a semantic solution to a QCN over . With , where and, for every , , we call an inflation of .

Intuitively, the inflation of  corresponds to an interpretation blown up from  by interpreting atomic concept names in the same way their corresponding variable names are “populated” by the solution. Notice that there are as many possible inflations of  as there are ways of interpreting  and  over . An immediate consequence of Definition 3 is that every inflation of a solution  is fulfilling.

Theorem 2.

Let be an ontology and let be a solution of . Then there is an inflation of s.t.  for each axiom of .

Theorems 1 and 2 establish that our translation is faithful, i.e., that the set of all fulfilling models of an ontology  are captured precisely in its translated QCN .

5 QCN Merging

We reduce the merging of a profile of ontologies to the merging of a profile of QCNs , where for each , , based on the faithful translation given the previous section. Inspired by works on syntactical QCN merging [Condotta2010], this QCN merging process is summarized as follows. We associate with the profile a single merged and consistent QCN representing in a “global” way. This is performed in a constraint-wise fashion: for each pair of variables , we associate each basic relation with a value representing its distance to the profile of constraints . This distance is the key tool to form the constraint of the merged QCN . Intuitively, each constraint corresponds to the set of basic relations with the lowest distances to the profile , while ensuring that the resulting QCN is consistent.

Example 1.

Consider the profile of ontologies ,,, that encodes the following knowledge about the four concepts of Paper, Text, Document and Book, respectively denoted by , , and 222An implementation to illustrate our method is also made available at the following link: https://github.com/ontologymerging/MergingOntologyWithQCN..

  • = = , = ,

  • = = , = ,

  • = = , = ,

  • = = , = .

Using the forward translation presented in the previous section, one associates with a profile of QCNs (). The four QCNs are depicted in Figure 1 (to alleviate the figures, we do not represent the when , i.e., when the QCN does not provide any information between the relationship between and ).

Figure 1: QCN profile of Example 1

Although we do not require it in the general case, note that in this example each ontology is consistent, i.e., the TBox of each input ontology does not contain conflicting information. As a direct consequence of Theorem 1, each QCN is consistent. However, simply combining these QCNs can easily lead to conflicts. For instance, there is no basic relation shared in the constraints and , since in we have that whereas in we have that . This calls for our QCN merging procedure.

5.1 Computing a distance between a basic relation and a profile of constraints

We start by considering a distance between basic relations. Firstly introduced in [Freksa1992TemporalRB] in the context of temporal reasoning, the notion of conceptual neighborhood (CN) between relations was later adapted to QCN merging in [Condotta2008] to define such a distance. Intuitively, two basic relations are CNs if a continuous transformation of two regions which satisfy the basic relation leads them to directly satisfy the basic relation without satisfying any other basic relation from . For instance, and are CNs since “shrinking” a first region initially equal to another region directly makes it a proper part of . This results in the neighborhood relation . This neighborhood relation induces a neighborhood graph whose vertices are the elements of , and where there is an edge between two basic relations whenever and are CNs according the neighborhood relation. The distance between two basic relations and is defined as the length of the shortest path between and in the neighborhood graph. For instance, since and are CNs, and since and (resp. and ) are CNs, but and are not. This distance is extended to a distance between a basic relation and a constraint , defined as . Lastly, given two variables , the distance between each and the profile of constraints is defined by .

Example 1 (continued).

Let us focus on (T) and (D). We have . For the distance between the basic relation and , we have that , where:

We get that . The distances between each basic relation from and the profile of constraints for each pair of variables is summarized in Table 3.

8 2 2 2 4 6
4 2 3 1 4 4
4 2 5 0 6 3
0 3 4 1 4 4
0 3 6 0 6 3
Table 3: Distances between relations from and the profile of constraints , for each pair of variables .

5.2 Using the distance to build a merged consistent QCN

We now describe our procedure which associates a profile of QCNs with a merged, consistent QCN. This takes advantage of the distance between basic relations and a profile of constraints . Let us first formally define two intermediate functions and which are used in our procedure. Given a total preorder333A total preorder over a set is a total, symmetric and transitive relation. over a finite set , let us denote by the set of minimal elements of w.r.t. , i.e., . Each pair of variables is associated with a total preorder over the basic relations from defined for all as iff . Then the function is a mapping defined for each as . It corresponds to the relaxation of a constraint w.r.t.. the total preordering . Noteworthy, corresponds to the set of basic relations with a minimal distance to the profile of constraints . The function is a mapping defined for each as . For instance, according to Table 3 and focusing on and (cf. ), we get that , that , that , and that .

We are ready to introduce our main procedure, whose outline is given in Algorithm 1 that defines an initial QCN by setting each one of its constraints to the set of basic relations from having a distance to the profile of constraints that is minimal (lines 1 to 1). If this QCN is consistent, then it is returned as the merged QCN (line 1). If not, some of the constraints of are relaxed in line 1, in the sense that some basic relations are added to these constraints. Such a set of constraints is selected as follows. In line 1, is first restricted to those constraints from which can be relaxed, i.e., those constraints not equal to . Among those candidate constraints, one selects in line 1 the constraints having a highest value . Indeed, we do not want to relax first the constraints consisting of basic relations which are “close” to the input profile, but rather would one prioritize the relaxation of more “controversial” constraints, i.e., those with a high value according to . For instance, let us look back at Table 3. It can be seen that , and thus ; this low value reflects the consensus between sources on the fact that one of the basic relations holds between and , and indeed it can be verified that the axiom is consistent with each input TBox. On the contrary, one has that ; this higher value reflects a disagreement between the input sources about the relationship between the concepts of and . And in the general case, whenever possible and in order to restore the consistency of the merged QCN, it is a sensible choice to keep unchanged those constraints unanimely accepted by the input sources, and rather weaken first the most disputed constraints. This “relaxation” process is repeated iteratively until the resulting QCN is consistent which, obviously enough, is guaranteed after a finite number of iterations.

input: A profile of QCNs
output: A merged, consistent QCN
1 begin
       // Initialization of the output QCN
2       foreach  do
3            
4       while  is not consistent do
             // One relaxes some constraints of
5             foreach  do
6                  
7            
8      
Algorithm 1 Computing a merged QCN
Example 1 (continued).

Initially, the merged QCN is defined by the following set of constraints, which correspond the basic relations highlighted in Table 3 (this QCN is also depicted in Figure 2(a)):

This QCN is inconsistent. One can see that the constraints and imply by transitivity that the relation between and must be or , yet . Then the constraint is selected (cf. line 1 in the algorithm) as the only candidate for relaxation at this point, since , which is the highest value among all constraints. And since , one updates to which results in the QCN depicted in Figure 2(b). This QCN is, again, inconsistent (in this case, explaining its inconsistency is more complex as it involves dependencies between all four variables. We omit the details for space reasons). Then the constraint is selected for relaxation () and one updates to . The resulting QCN (cf. Figure 2(c)) is consistent and returned by the procedure.

Figure 2: The QCNs iteratively generated by our algorithm. Fig. 2(c) corresponds to the final consistent merged QCN.

Algorithm 1 runs in a time that is polynomial on the size of the input QCN profile, given access to an NP oracle in one step (line 1). Indeed, (i) checking the consistency of an RCC-5 QCN can be performed by taking advantage of a standard SAT solver [Condotta2016], (ii) the number of possible relaxations of a given constraint is bounded by a constant (the number of RCC-5 basic relations), and as a consequence, the number of iterations performed in the loop starting from line 1 is in ; and (iii) the functions and are computed in since the distance is computed in , again because the number of RCC-5 basic relations is bounded by a constant. This makes the complexity of Algorithm 1 in , that is reminiscent of the complexity of the inference problem for propositional belief merging, i.e., , for a vast majority of the existing operators [Konieczny2004].

Notably, the QCN merging step we propose in our framework can be seen as an adaptation of [Condotta2010] that allows one to work with a single QCN as a merged result (the output of Algorithm 1). Nevertheless, the work of Condotta et al. [Condotta2010] does not provide such a procedure.

6 Selecting a Representative Scenario of the Merged QCN

Once we have obtained a merged, consistent QCN, our goal is to express it in our initial (target) ontology language. However, not every constraint from the QCN (i.e., a subset of ) can easily be translated as a set of axioms, since non-singleton constraints express some disjunctive information between two concepts/regions. Moreover, since the merged QCN is consistent, one can remark it necessarily admits at least one consistent scenario, and since a scenario involves singleton constraints (as well as the two constraints , ), it can easily be translated into a single ontology, as will be shown in the next section. Then our aim is to (i) focus on all consistent scenarios of the merged QCN, and (ii) select one representative scenario. This can be done by exploiting information provided by the input ABoxes as we intend to show in the rest of this section.

Figure 3: The consistent scenarios of the merged QCN.

In our running example, the merged QCN admits four consistent scenarios which are depicted in Figure 3. Let us first discuss why these four scenarios seem to be reasonable candidates to the input ontologies / QCNs provided by the sources. First, note that all input ontologies state that ( in the QCN profile). And it can be seen that this consensus is reflected in the four candidate scenarios, which entail that information. Second, while the two sources and state that and are disjoint concepts ( in the corresponding input QCNs), only one source says that (). In this case, it make sense to follow the point of view of the majority of the sources. And accordingly, in all four scenarios we have that , thus and are disjoint concepts. Third, the source states that and all other sources have no information on these concepts. It is sensible to keep this information in the merged result, and accordingly all four scenarios entail that information. More, one sees that holds in all scenarios, i.e., that is a strict part of , or stated otherwise, that both concepts cannot be equal while keeping the relationships between the remaining concepts consistent. This emergent property is also an interesting feature of the merging process. Last, the reason why there are four, equally reasonable, candidate scenarios is that some strong disagreements hold on the relationships between the concepts and on the one hand, and the concepts and on the other hand. Accordingly, the only differences between the four scenarios hold on the constraints between these two pairs of concepts ( / , and / ).

What remains to be done is to select one of these four scenarios. For this purpose, one takes advantage of the ABoxes from the input ontologies and see how these ABoxes relate to each scenario. To be as faithful as possible to what each input source says, instead of simply considering each input ABox as such, one considers the “closure” of it according to its corresponding TBox. For instance, if a given source states that in its TBox and that in its ABox, then it makes sense to also consider that also holds in that source’s implicit knowledge. Formally, let be an ontology. The deductive closure of w.r.t. , denoted by , is defined as [Baader2007, Benferhat2015HowTS]. Accordingly, is logically equivalent to .

To select the representative scenario, we define a distance between a scenario and the set of all input (closed) ABoxes, and then use this distance to choose the scenario. The representative scenario is the one having a minimal distance. So given a scenario, the idea is to count the number of individuals in each input ABox which raise a conflict w.r.t. the constraints of that scenario. This can naturally be done for any scenario constraint where . For instance, if and hold in the ABox of a given source, and holds in the scenario under consideration, then according to that ABox is an individual that raises a conflict with that scenario. Another example is if holds but not , then is not a member of the concept in the ABox (recall that ABoxes are closed w.r.t. their TBox); in that case raises a conflict with the constraints when . More formally, given an ABox , a scenario constraint where , and an individual , we say that raises a conflict with w.r.t  when:

And the number of conflicts raised by an ontology w.r.t. a a scenario constraint where , is defined as .

The case of is more complex. Indeed, it can be easily seen that no individual can raise a conflict with . This is because all basic relations express explicit dependencies between concepts / regions, whereas is a complementary relation that (explicitly) expresses a notion of independency between concepts. For instance, the concepts and can naturally be thought of as independent concepts, in the sense that one can easily find in a real-world context individuals that are members of either both concepts, only one of them, and none of them (note that this should not be confused with the case of , which expresses an explicit dependency between concepts, e.g., the concepts and .) So to evaluate the number of “conflicts” raised by an ABox w.r.t. a constraint , we propose to count how “unbalanced” the number of conflicts are w.r.t. the remaining forms of constraints. Formally, focusing on the scenario constraint between two variables and , . For instance, when , then : since individuals can be found equally (i) in both underlying concepts, and (ii) in one concept but not the other, raises no conflict w.r.t.  .

We have now a way to select the representative scenario from a given set of candidates. The distance between a scenario and the input profile of ontologies is simply defined as the overall number of conflicts raised by all input ABoxes w.r.t.  all constraints of , i.e., . Given a set of candidate scenarios, the representative scenario is then the one having a minimal distance.

Example 1 (continued).

Let us go back to our running example. We have that . So focusing on and (i.e., on the scenario constraints between the variables and ), we have that , and one can easily verify that . Summing up all conflicts, we get that , , , and . Hence, the scenario is selected as a representative scenario of the merged QCN.

7 Translating the Representative Scenario into a Terminological Knowledge

The last step of our framework is to translate back the resulting selected representative scenario into an ontology. The translation is defined as follows.

Definition 4 (Backward translation ).

A backward translation is a function s.t. . is extended to map constraints into an ontology as follows, where , , and are new concept names and , , and new individual names:444Notice that the constraint cannot be translated into a set of GCIs only, whence the use of ABox assertions in the translation.

  • ;

  • ;