Method for the semantic indexing of concept hierarchies, uniform representation, use of relational database systems and generic and case-based reasoning

10/03/2019
by   Uwe Petersohn, et al.
0

This paper presents a method for semantic indexing and describes its application in the field of knowledge representation. Starting point of the semantic indexing is the knowledge represented by concept hierarchies. The goal is to assign keys to nodes (concepts) that are hierarchically ordered and syntactically and semantically correct. With the indexing algorithm, keys are computed such that concepts are partially unifiable with all more specific concepts and only semantically correct concepts are allowed to be added. The keys represent terminological relationships. Correctness and completeness of the underlying indexing algorithm are proven. The use of classical relational databases for the storage of instances is described. Because of the uniform representation, inference can be done using case-based reasoning and generic problem solving methods.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

09/18/2019

Knowledge representation and diagnostic inference using Bayesian networks in the medical discourse

For the diagnostic inference under uncertainty Bayesian networks are inv...
09/23/2018

Understanding the Gist of Images - Ranking of Concepts for Multimedia Indexing

Nowadays, where multimedia data is continuously generated, stored, and d...
04/25/2020

Fuzzy Logic Based Integration of Web Contextual Linguistic Structures for Enriching Conceptual Visual Representations

Due to the difficulty of automatically mapping visual features with sema...
07/25/2019

The Strong 3SUM-INDEXING Conjecture is False

In the 3SUM-Indexing problem the goal is to preprocess two lists of elem...
09/16/2017

SKOS Concepts and Natural Language Concepts: an Analysis of Latent Relationships in KOSs

The vehicle to represent Knowledge Organization Systems (KOSs) in the en...
02/02/2022

Quantification and aggregation over concepts of the ontology

The first phase of developing an intelligent system is the selection of ...
05/14/2018

Vector Programming Using Structural Recursion

Vector programming is an important topic in many Introduction to Compute...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Problem statement

Modern methods of knowledge representation, like description logic (e.g. [9], [8]) and ontologies (e.g. [28]), fulfill the properties of formal semantics and high expressiveness. Furthermore, they enable powerful inference procedures and are suitable for a wide range of applications. Various capable development environments and software tools exist.

The main aim of this paper is the development of a methodology that, on one hand, permits the conceptual recording and structuring of the application domain through concepts, concept hierarchies, multi-axial composition of concepts and concept descriptions. On the other hand, it uses a representation that allows for dynamic dialog systems, case-based and generic problem solving methods and their interactions with relational databases to be implemented in one system. This modeling is, for example, relevant in medical domains, when solutions to problems require the inclusion of knowledge from experience (treated cases) in addition to generic knowledge (textbook knowledge).

Representational paradigm

A uniform and structured representation of the domain’s concepts and concept hierarchies as well as the conceptual knowledge of the domain of discourse is achieved through a mapping to “semantic indices” (computed keys). A partial unification of keys connects concepts of the domain, concept hierarchies and the defining concept descriptions and instances.

Starting point of the semantic indexing are the concept hierarchies of the represented domain of discourse. The goal of the semantic indexing is assigning a key to each node and each concept, with the key of a certain concept summarizing all keys for nodes of that concept (Chapter 2). The keys are computed with the presented algorithm “semantic indexing” (Chapter 3) and represent terminological relationships. Correctness and completeness of the underlying indexing algorithm are proven. Each key also contains its inheritance path. Through a multiaxial modeling of concept hierarchies, a clear description is possible even for complex concepts, situations and expressions (Chapter 4). The integrative application of different inference methods is possible (Chapters 5, 6).

Databases

Knowledge processing systems often store their data in very different ways and are frequently focused on specific data structures for an efficient access. The approach in this paper enables a uniform representation that permits the modeling and representation of practical domains as well as storage of instances in databases (Chapter 7), but also ensures an efficient analysis of the represented data and knowledge. It connects the fields of knowledge representation and relational databases.

The use of database systems for the maintenance of structures and instances with a uniform representation allows the implementation of efficient possibilities for accessing the stored data and knowledge as well as the usage of the extensive possibilities of modern database systems.

Applications

As a result of the uniform knowledge representation, the setup, architecture and implementation of knowledge bases can be improved. This is primarily done through the structured storage of knowledge from concept hierarchies, the storage of semantically clearly defined instances in a knowledge base as well as an efficient retrieval. Different problem solving methods can be applied, especially in combination with each other, to evaluate the knowledge under different aspects, for example using logic-based, concept-based and case-based reasoning (Chapter 6).

Related Work

Various different concept languages for terminological logics with different syntax, expressiveness and complexity exist, e.g. KRIS111KRIS Knowledge Representation and Inference System is a terminological logic with high expressiveness that was developed at the DFKI (German Research Center for Artificial Intelligence) [10]., LOOM222LOOM is a terminological logic with very high expressiveness that was developed at the University of California. PowerLOOM, as the successor of LOOM, is even more expressive and uses a variant of the language KIF (Knowledge Interchange Format) [3]., KRYPTON333KRYPTON combines frame-based representational language and theorem provers for the first order predicate logic [11]. or

ALCQUI 444ALCQI is based on the standard description logic ALC. It is extended by qualifying number restrictions and converse roles [30].. Other important languages are CRACK, KANDOR, SHF, NIKL, and SHIQ. Similarly there exist different inference systems (e.g FaCT555FaCT is a system that is based on tableau methods and supports OWL (web ontology language) and OWL 2 [1]., RACER666RACER (Renamed ABox and Concept Expression Reason) is a system of knowledge representation that was developed at the University of Hamburg, based on the very expressive logic SHIQ. It implements a strongly optimized ABox-tableau method [20].) and development environments (e.g. OntoSaurus777OntoSaurus is a graphical web browser for knowledge bases that are based on LOOM [29].). For the computation of the subsumption there are essentially two methods: NC-algorithms and tableau methods. NC-algorithms are based on the syntactical comparison of the deduced concept descriptions. The proof of correctness for the algorithm is simple, but the majority of NC-algorithms are incomplete. Tableau methods, like they are implemented in e.g. KRIS, are theorem provers with backward chaining. Determining the subsumption of deduced concepts is turned into determining a contradiction and can be answered with conventional problem solving methods. Correctness and completeness are provable. Terminological reasoning is decidable but NP-complete. Computing the subsumption for complex applications is often connected to an exponential time complexity. The following compromises are possible:

  • Forgoing negation, disjunction and quantification respectively to reduce the expressiveness and achieve polynomial time complexity (KRYPTON, CLASSIC888CLASSIC was developed by AT&T Laboratories and has implementations in Lisp, C and C++. It is one of the less expressive systems [13].)

  • Forgoing completeness and decidability of inferences (NIKL, LOOM, KL-ONE999KL-ONE represents, as a representational language, the origin of terminological logics. KL-ONE is based on the formalization and generalization of the principles of frames and semantic networks. It serves the construction of complex structured concept descriptions [12].)

  • Accepting the exponential computing time (KRIS)

However, correct and complete inferences with polynomial time complexity are only possible with severely limited terminological logics.

For ontologies, various different languages are available as well. Either informal graphical (e.g. CML101010The Conceptual Modelling Language describes non-functional requirements of applications that were implemented in the programming language C [27].) or formal description languages (e.g. Ontolingua111111Ontolingua is based on KIF (Knowledge Interchange Format) and the frame ontology and was developed especially for a formal specification of ontologies. KIF is a description language that was built on predicate logic and extended with language primitives. Thus meta-statements about relations can be made [17]., CycL121212CycL is the language on which the knowledge representation system CYC is based [6]., FLogic131313FLogic is an integration of frame-based languages with the predicate calculus, which contains object oriented approaches [22]., RDFS141414RDF-Schema is a W3C standard language for the XML-based representation of ontologies [4]., OIL151515OIL is a web-based representation and inference layer for ontologies which builds on RDFS and expands its expressiveness [15]., DAML+OIL161616DAML+OIL was developed for the realization of the semantic web. It is an independent continuation of OIL but the development has not continued since 2001, because OWL is held as the successor[2].) can be used. In addition, there are development environments like OntoStudio171717OntoStudio is a ontology development environment for creating and modifying ontologies. It supports among others the languages OWL and RDFS [7]. or OntoSaurus, which support the creation of ontologies. For inferences, ontology definitions are mapped to concrete operationalizations of the logical and operationalizational layer, which possess formal semantics and enable correct and complete inferences. Depending on the language used in the representation vocabulary layer, the mapping happens in the logical and operationalizational layer. Depending on the chosen operationalization, the subsumption (exponential time complexity) or the predicate logic theorem prover (polynomial to exponential time complexity) can be applied as inference methods.

Newer inference tools are being developed on the basis of the description languages. However, they are mostly based on the mentioned established languages. For instance in [21] an XR-tree (XML Region Tree), a dynamic external memory index structure for strictly nested XML data, is proposed for retrieval. For a given element all its ancestors and / or descendants can be efficiently identified in an element set which is indexed by an XR-tree. The new devised stack-based structural join algorithm named XR-stack is able to evaluate two XR-tree indexed element sets regarding their structural relationship. This is done by avoiding unnecessary element scans by skipping ancestors and descendants which do not have matches. Experiments were conducted to show that the performance of XR-stack in comparison with the current state of the art significantly outperforms previous algorithms [21].

Another approach is the use of LiteMat, an inference encoding scheme for large RDF181818RDF (Resource Description Framework ) is a standard developed by the W3C that forms the basis for the semantic web. Every expression consists of the 3-tuple subject, predicate, object [5]. graphs, which is presented in [14]. Inferences are based on RDFS and the owl:sameAs property. The proposed extensions by integrating owl:transitiveProperty and owl:inverseOf properties enable to reach RDFS++ expressiveness. This has been achieved by assigning meaningful identifiers to elements of the TBox and ABox. This is efficient regarding memory space, query processing and speed of encoding [14].

2 Basic concepts and introductory examples

The following chapters will introduce the representational paradigm and illustrate it with examples.

2.1 Concept hierarchies

Definition 1 (Concept hierarchy)

A concept hierarchy is a tree whose elements are concepts with the following properties:

  1. A node must not have more than one child node with the same concept.

  2. The dependency graph for the concept hierarchy is free of cycles.

The first property means that it is not possible for multiple nodes with the same concept to belong to the same parent node. A hierarchy is supposed to establish an order between its elements. Therefore, such a repetition would not be meaningful. In other words: Two identical child nodes being attached to the same parent node would contain the same information as only one of them being attached.

Figure 1 shows an example of a concept hierarchy. It is a tree whose nodes are concepts. Each concept can appear multiple times.

Figure 1: Simple concept hierarchy and the respective dependency graph.

A dependency graph can be created for every hierarchy (Figure 1, on the right). It contains all occurring concepts as nodes. In the dependency graph, an edge points from “node A” to “node B” if “node A” is at least once in the concept hierarchy directly below “node B”. In the example, “concept 2” and “concept 3” are directly below the root of the concept hierarchy “concept 1”. Therefore, edges from “concept 2” and “concept 3” point to “concept 1” in the dependency graph.

To explain the second part of Definition 1, a few more terms have to be introduced.

A circuit in a graph is a walk in which start and end node are the same. A circuit in a graph is a cycle if is a path. The absence of cycles has to be proven.

Definition 2 (Path)

A path in the concept hierarchy is a sequence of nodes in which all with are nodes in the hierarchy and all with are directly below in the hierarchy.

Definition 3 (More specific, more general)

A concept is more specific than a concept in a certain concept hierarchy if there is a path in the dependency graph that leads from to . Conversely, a concept is more general than a concept in a certain concept hierarchy if there is a path in the dependency graph that leads from to .

Another important feature of semantic indexing are the keys. For this, some concepts have to be formally defined.

Definition 4 (Keys)

A key is a comma-separated list. Every element with of this list is either a constant or a variable x.

The concept hierarchy has only one variable . However, it can appear at multiple positions in the list. The different variables are completely independent from each other.

Definition 5 (Length of a key)

The length of a key is the number of elements of that key.

Definition 6 (Partial instance)

A key of length is a partial instance of a key of length if at least one variable is substituted by a constant : with and .

Definition 7 (Instance)

A key of length is an instance of a key of length if variables get substituted by constants and , for all with and .

Definition 8 (Set of all instances)

For a key , is the set of all instances of .

Instances of a key can be represented by substituting the variables at different positions with constants. It is not necessary to substitute all variables.

Furthermore, a hierarchy of keys has to be defined for this context as well. Keys can be more specific than other keys in two ways: They can be longer or they can substitute variables with constant symbols.

Definition 9 (Initial key)

For a key and , denotes the initial key .

Definition 10 (Position within a key)

For a key and , denotes , the -th position in .

Definition 11 (Partial unification)

A key is partially unifiable with a key if there exists an instance of that is initial key of an instance of .

Partial unifiability will now be explained with an example.

Example 1 (Partially unifiable)

is partially unifiable with , because the instance of is an initial key of the instance of .

An indexing algorithm has to compute the keys such that a concept is partially unifiable with all more specific concepts and only semantically correct concepts can be inserted. In the database, only the instances are represented. The complete trees are not stored. This allows efficient access and the usage of keys for retrieval, inferences and other processes.

The indexing algorithm creates node and concept keys.

Definition 12 (Node keys, concept keys)

The key of a node is the node key . The key of a concept is the concept key (concept keys are underlined).

This distinction is necessary because concepts can appear multiple times in the concept hierarchy. A concept key has to correctly index its concept in all places of the concept hierarchy. The node key represents the instances, including the inheritance path, within the database.

Example 2 (Node keys, concept keys)

The following table compares node and concept keys.

Table 1: Example for the distinction between concept and node keys. concept node key concept key pain pattern [0] [0] cardinal symptom [0,0] [0,0] radiating pain [0,1] [0,1] localization [0,0,0], [0,1,0] [0,x,0] intensity [0,1,1] [0,1,1] spine [0,0,0,0], [0,1,0,0] [0,x,0,0] head [0,0,0,1], [0,1,0,1] [0,x,0,1] shoulder/arm/hand [0,0,0,2], [0,1,0,2] [0,x,0,2] high [0,1,1,0] [0,1,1,0] medium [0,1,1,1] [0,1,1,1]

For illustration purposes, the corresponding graph with node and concept keys (underlined) is shown in Figure 2.

Figure 2: Example graph for the distinction between concept keys and node keys.

Thus, every more specific key can be unified with a more general key, because the more general key has to be initial key of the more specific key. With the processing using partial unification, all necessary parts of the hierarchy can be reconstructed unambiguously.

The keys are here represented as terms in list syntax. Other syntactical representations are of course possible. The list unification of the terms can be implemented very effectively. Because independent variables and constants are used, it is merely a partial matching.

2.2 Example “Simple anamnesis”

This chapter presents an example for the specifics of the knowledge representation and the indexing algorithm. The formal details will be explained in the following chapters.

A highly simplified anamnesis is examined. The terminology is shown in Figure 3. Inheritance takes place from the roots to the leaves. For example, the node “strong” indicates that a strong pain intensity was detected during the anamnesis.

Figure 3: Simple anamnesis tree as well as uniform knowledge representation.

Now the indexing algorithm has to assign keys to all concepts. The details of this process are described in the algorithm specification (Chapter 3.2). After all concepts have been assigned keys, the hierarchy in Figure 3 can be represented with the keys shown in Listing 1.

([0] "anamnesis" ([0,0] [0,1]))
([0,0] "pain pattern" ([0,0,1] [0,0,2] [0,0,3]))
([0,1] "feeling")
([0,0,1] "localization" ([0,0,1,0] [0,0,1,1] [0,0,1,2]))
([0,0,1,0] "spine")
([0,0,1,1] "head")
([0,0,1,2] "shoulder/arm/hand")
([0,0,2] "quality")
([0,0,3] "intensity" ([0,0,3,0] [0,0,3,1]))
([0,0,3,0] "strong")
([0,0,3,1] "very strong")
Listing 1: Possible representation of the tree with keys.

Usually the keys also contain variables. Nodes in the hierarchy that are not leaves are also given additional references to the concepts underneath. This hierarchy is simple insofar as each concept only appears once. In general this is not the case.

To store an aspect of the anamnesis, a sequence of child nodes is selected, starting from the root node with the key and ending when the most specific valid node is reached. This node has a key that represents the chosen node and the whole path from the root to the node itself.

Now the obtained key, e.g. , can be stored in the database with a unique primary key id. From this key it is possible to unambiguously reconstruct every valid key of the hierarchy. In this case the key specifies the concept “head”. The key is unifiable (Definition 11) with . It stands for “localization”. From here one arrives at “pain pattern” and finally “anamnesis”. A node is specified through the path to itself (in the example the concept itself, without the path, is sufficient for identification but it should be noted that in general, concepts can appear multiple times).

2.3 Uniform representation

The hierarchy in Figure 3 shows two structures: an inheritance structure (characterized by two-headed arrows) and an element-set-relationship (characterized by single-headed arrows). With the semantic indexing, both structures can be represented and treated uniformly.

The uniform representation enables a consistent syntax of representation over multiple implementations. It formulates syntactical instructions for describing the representation and thus is an integral property of a knowledge database. The representation allows for concept and node keys with the contained inheritance paths to be used independently from specific inference methods or implementations.

2.4 Example “concept key”, “partial unification of subtrees”

Figure 4 shows the concept hierarchy of a domain. The nodes in the example are important concepts that are ordered according to the domain knowledge.

Figure 4: Excerpt from the domain’s concept hierarchies for “pain localization”, “occurrence of the pain” and “position”.

The concept hierarchy “position” enables a more precise specification of the concepts in “pain localization”. This is possible through the partial unification. The concept hierarchy “occurrence of the pain” also allows a more precise specification of the whole pain profile (Chapter 4).

An excerpt from the representation of the example from Figure 4 is shown in Listing 2191919The prefixed letter allows for an easier identification of the anamnesis subtree. In later examples, for diagnosis and for therapy are also used..

([a,x,0,1] "pain localization" ([a,x,0,x,0] [a,x,0,x,1] [a,x,0,x,6]))
        ([a,x,0,x,0] "mouth / face / head" (a,1,0,1,0,0] [a,1,0,1,0,1]
         [a,1,0,1,0,2] [a,1,0,1,0,3] [a,1,0,1,0,4]))
                ([a,1,0,1,0,0] "mouth / face / head in general" ([a,1,0,1,x,x,1,1]
                 [a,1,0,1,x,x,1,2] [a,1,0,1,x,x,1,3]))
                ([a,1,0,1,0,1] "face" ([a,1,0,1,x,x,1,1] [a,1,0,1,x,x,1,2]
                 [a,1,0,1,x,x,1,3]))
                ([a,1,0,1,0,2] "forehead" ([a,1,0,1,x,x,1,1] [a,1,0,1,x,x,1,2]
                 [a,1,0,1,x,x,1,3]))
                ([a,1,0,1,0,3] "eye" ([a,1,0,1,x,x,1,1] [a,1,0,1,x,x,1,2]))
                (...)
        (...)
([a,1,0,1,x,x,1,1] "left")
([a,1,0,1,x,x,1,2] "right")
([a,1,0,1,x,x,1,3] "middle")
([a,x,0,8] "occurrence of the pain" ([a,x,0,8,0] [a,x,0,8,1] [a,x,0,8,2]
 [a,x,0,8,3] ...))
        ([a,x,0,8,0] "almost no pain anymore")
        ([a,x,0,8,1] "attacks with no pain in between")
        ([a,x,0,8,2] "attacks with slight pain in between")
        (...)
Listing 2: Syntactical representation of the excerpt from the concept hierarchies for “pain localization”, “occurrence of the pain” and “position”.

3 Indexing algorithm

After the presentation of the theoretical groundwork in the previous chapter, the indexing algorithm will now be presented202020Algorithmic presentation and proofs follow an unpublished report by U. Petersohn and J. Lehmann of Technische Universität Dresden, faculty of computer science, 2005..

3.1 Correctness and completeness of an indexing algorithm

Prior to the description of the algorithm, the properties of correctness and completeness will be defined.

Definition 13 (Correctness)

An indexing algorithm is correct if it fulfills the following properties:

  1. Every concept is assigned a key such that for any two keys for concept and for concept : .

  2. Let be a key of length for an arbitrary concept . Every node with the concept , that is reached in the concept hierarchy via the path , is assigned a node key that fulfills the following conditions:

    1. is an instance of .

    2. Either is the root of the hierarchy or there exists an with such that , the parent node of , has the node key , i.e: is an initial key of .

    3. All with are not keys of nodes that are not in the set .

The first criterion in the definition enables the clear distinction between different concept keys. The second criterion deals with node keys. 2a connects node keys and concept keys. 2b and 2c indicate that for a node all initial keys of appear directly on the path to and do not appear on any other path.

Additional properties follow from the definition of correctness. They will be proven below.

Proposition 1 (Correctness)

If an indexing algorithm is correct according to Definition 13, it also has the following properties:

  1. For two nodes and with it is .

  2. For two nodes and , with parent node of , is partially unifiable with .

  3. For two concepts and , and a node of concept that is the parent node of a node of , is partially unifiable with .

The items of Proposition 1 will be proven consecutively (numbering as in Proposition 1).

  1. Proof by contradiction: Let there be two nodes and with and a key , that is an instance of both and . The proof is split in two parts:

    1. is on the path from to the root:
      Then, according to Definition 13 2b, is an initial key of , and therefore .

    2. is not on the path from to the root:
      Then, according to Definition 13 2c (for the special case ), is no node key of a node outside of the path . Because is outside of that path, follows.

  2. According to Definition 13 2b, is an initial key of . The claim follows directly.

  3. Let be the node with concept and the node with concept . Following from the preconditions, is a parent node of . According to Definition 13 2b, is initial key of , therefore is partially unifiable with . Furthermore, by Definition 13 2a, is an instance of and is an instance of . Therefore is partially unifiable with .

The criterion for completeness is less complex than correctness, as shown by the following definition.

Definition 14 (Completeness)

An indexing algorithm is complete if it creates an index for each given concept hierarchy.

3.2 Description of the indexing algorithm

The algorithm uses the two operations generalization and expansion of keys.

Definition 15 (Generalization of keys)

Let be a finite sequence of keys. The generalized key of these keys is defined as follows:

has the same length as the longest key .

Example 3 (Generalization of keys)

This example shows the generalization of the keys and .

The generalization is a consecutive comparison of all keys at a specific position. If there are different values at the same position or the variable appears, the result at that position is . If all values at a position are the same natural number, that number is kept.

Definition 16 (Expansion of keys)

The expansion of a key of length towards a key of length with is defined as follows:

has the length .

During the expansion of a key towards another key , is filled up with values from so that both have the same length.

Example 4 (Generalization and final expansion of keys)

After the generalization of a set of keys and the subsequent expansion of the shorter keys towards the generalized key, it is clear that all created keys are instances of the generalized key. Combining the previous examples for generalization and expansion results in the following:


The boxed was added during the expansion towards the generalized key. The keys and are instances of .

As an abbreviation, a notation for the “parent nodes of a concept” is introduced (Definition 17). The quotation marks are used because a concept itself is not a node, and therefore also does not have a parent node.

Definition 17 (“Parent nodes of a concept”)

For a concept , is defined as the set of all nodes in the concept hierarchy that have a child node with concept .

Example 5 (“Parent nodes of a concept”)

In Figure 2, (“localization”) is a set with the two nodes “cardinal symptom” and “radiating pain”.

Based on the previous definitions, the algorithm can now be specified. Input for the algorithm is the given concept hierarchy. Additionally, for each node a number is stored and initialized with . Furthermore, every node in the concept hierarchy stores the respective node key.

Algorithm 1 (Indexing algorithm)

The indexing algorithm can be specified as follows. Input: concept hierarchy with a counter in each node Output: all concept keys and node keys Initialization: the root node of and its concept get the key The following operations are repeated until every concept has a key:

  • OP I - selection operation:
    Choose a concept such that all elements of already have a node key.

  • OP II - derivation operation:
    For each element a key is generated by appending the counter of to the key . Next, the counter of is raised by 1. If a node appears twice or more with the same path length, two or more keys are created. Those are generalized in OP III. An expansion (OP IV) is not necessary because both keys have the same length. The result is a concept key. The node keys are instances of that key and are updated in the concept hierarchy.

  • OP III - generalization operation:
    All generated keys are generalized according to Definition 15. If the generalized key has instances in common with another concept key that was already created, more numbers are appended until this is no longer the case. The resulting key is .

  • OP IV - expansion operation:
    Now all node keys are expanded towards . If keys were created and the generalization resulted in concept keys of differing length, those are expanded according to Definition 16. In this manner, all node keys are obtained.

  • OP V - addition operation:
    The obtained node keys are added to the respective nodes in .

The described indexing algorithm is illustrated with a detailed example in Chapter 3.5.

3.3 Proof of the correctness of the algorithm

Proposition 2 (Correctness)

Algorithm 1 is correct.

To prove the correctness of the algorithm, all properties from Definition 13 have to be shown. The following chapter uses the same notations and numberings as the definition.

  1. This property follows directly from OP III and IV. The algorithm demands that a created concept key must not have instances in common with any other key.

    1. is the generalization of all node keys with the concept . Therefore, every node key with is an instance of .

    2. For the parent node of a node , the node key is generated in the algorithm by appending a number to . Then the key is expanded. Clearly, is an initial key of .

    3. As a proof by contradiction it is assumed that there exists a node whose key is an initial key of , but which is not part of the path from to the root. Then there is a node in the concept hierarchy at which the paths from and to the root intersect. Let have the length . has a child node on the path from to the root and a child node on the path from to the root. Figure 5 illustrates this. In the figure, dashed lines symbolize potentially multiple nodes. Because every node of the concept hierarchy by definition only has nodes with different concepts as children, and belong to different concepts. Therefore, the counters that were appended during the computation of and in OP II are different. Hence, and differ at position of the keys. By applying 2b multiple times for and it follows that is an initial key of and is an initial key of . Thus, and also differ at position and can not be an initial key of .

Figure 5: Illustration for the proof of correctness.

3.4 Proof of the completeness of the algorithm

Proposition 3 (Completeness)

Algorithm 1 is complete.

For the proof of completeness it is necessary to show that the algorithm computes an output in finite time for each input. The operations II to V are simple computations that clearly are always executable in finite time. Therefore, proving operation I is sufficient. It states that a concept has to be found such that all elements of already have node keys. To prove completeness it has to be shown that for all cases such a concept exists.

For the proof it is assumed that no such can be chosen. Then it is shown that in that case the dependency graph of the concept hierarchy is not free of cycles. This violates the definition of concept hierarchies (Definition 1).

The indexing algorithm computes the node keys based on operation I such that every newly indexed node already has an indexed parent. It is assumed that the algorithm reaches a point at which there is no concept for which all elements of are already indexed. In that step there is a set of nodes whose parent nodes are already indexed. Let the corresponding concepts be . The subtrees spanned by will be called . Figure 6 illustrates the situation.

Figure 6: Illustration for the proof of completeness.

The gray nodes have already been indexed. The nodes framed by the dashed line are .

For all there are, as required, nodes in that are not in the set and also not part of the already indexed nodes.

Let be such a node with concept This node appears in one of the subtrees spanned by . If appears in the subtree then there is a cycle in the dependency graph because is a node with concept and in the tree there is the node below which also has concept . In the dependency graph, an arrow would lead from (via eventual intermediate nodes) back to . Therefore can not be part of .

Then has to appear in at least one other subtree . Therefore, there is a path in the dependency graph (because is below in the concept hierarchy), so depends on .

Starting from the same argument can be applied again. If there is a node with concept that appears in and then the dependency graph contains a cycle. It follows that a subtree exists in which a node with concept appears. Then the dependency graph contains a path .

Applying this argument times results in a path in the concept hierarchy. But because there are only different concepts, one of the concepts has to appear multiple times in that path. Therefore there is a cycle in the dependency graph. This violates the definition of concept hierarchies.

The proofs of correctness and completeness show that the algorithm always produces the correct result.

3.5 Example for the operating principle of the indexing algorithm

3.5.1 Example

The previous remarks will now be demonstrated with an example for the operating principle of the algorithm.

Initialization

Starting point is the concept hierarchy that is initialized by the algorithm. The root node and its concept can already be indexed as shown in the following figure. The counters stored for each node are initialized with .

Figure 7: Initialization concept .
Step 1

In this step the concept is chosen (in accordance with ). contains only one node, the root node. The counter of the root node is appended to its key, resulting in . Then the counter of the root node is raised (). Because there is only the key , (generalization) and (expansion) of the algorithm are trivial. The resulting concept key for is . In the node key is updated in the concept hierarchy.

Figure 8: Step 1 (concept ).
Step 2

Now concept is chosen (). Analogous to the previous step, by appending the counter of the root node to its key, the key is obtained. Like in the previous step, and are trivial.

Figure 9: Step 2 (concept ).
Step 3

Now concept is chosen. Because appears twice, two keys are created: and . These are generalized in (see auxiliary calculation). An expansion () is not necessary because both keys have the same length. The resulting concept key is . The node keys are instances of this key and are updated in the concept hierarchy.

Figure 10: Step 3 (concept ).
Step 4

In this step is the only concept that can be chosen. Analogous to the previous steps the keys and are generated. The generalization produces the concept key . The two created keys do not have the same length, so in accordance with the keys are expanded towards the generalized key: The boxed is added to . This results in the two node keys and .

Figure 11: Step 4 (concept ).

Step 5
The last concept is also indexed as described. At this point it should be noted that in it has to be checked whether the obtained key shares instances with other concept keys. If this is the case, more numbers must be appended to the generalized key, so that there are no shared instances. Because all concepts are fully indexed, the indexing algorithm now terminates.

Figure 12: Step 5 (concept ).

3.5.2 Remarks

  • The algorithm offers an indexing. However, there exist other indexing solutions, because in each path nodes can appear in different orders. It is not practical to consider all possible indexings. If necessary, the input concept hierarchy can be suitably sorted beforehand.

  • If needed, side conditions for the index creation can be built in, as long as correctness and completeness are not violated. Biunique renaming is possible.

  • If concepts are distributed across multiple levels of a concept hierarchy, this can possibly lead to a large amount of variables that have to be inserted. Ideally each concept should be spread across only one level. This provides a clearer structure. More complex compositions of concepts can be implemented more clearly with a multiaxial description (Chapter 4).

4 Multiaxial modeling of concept hierarchies

4.1 Forced hierarchization versus independent trees

Prerequisite for the semantic indexing is the existence of a concept hierarchy, or the possibility of modeling a concept hierarchy (Chapter 2). One possible problem of semantic indexing is a disadvantageous forced hierarchization of aspects in the knowledge acquisition and formalization, or rather, that many variables have to be introduced to represent concepts used on multiple levels. This happens in particular when the structuring of knowledge bases in the modeling phase is unclear and can be problematic. The recommended solution is a multiaxial description and representation.

Frequently a concept hierarchy can only model one specific aspect of reality. Still, a powerful system should be able to connect multiple aspects (axes). This connection is declared by the multiaxial description which can be easily handled with semantic indexing. For each individual axis there exists a concept hierarchy that can be indexed. Each concept hierarchy has a unique name and key. The axes are then combined conjunctively (Chapter 2.1).

4.2 Concept hierarchies, uniaxial and multiaxial systems

Terminologies and taxonomies have to represent the complete range of formulations and synonyms of the application domain.

They serve the creation of concept hierarchies, the classification as well as the usage of adequate, semantically correct concepts on the basis of propositions of the domain.

Definition 18 (System of order)

Systems of order for concepts systematically map propositions to concept units. An order is formed through conceptual, systematic and semantic axes. Concept hierarchies can be defined.

In general the following requirements have to be met by systems of order:

  • Completeness:
    The considered domain has to be represented completely, i.e. there must not be any missing concepts and it has to be possible to add new concepts.

  • Disjointness:
    All concepts should be represented uniquely without overlaps. Redundancies should be avoided. If multiple identifiers are necessary, synonymic links and preferred identifiers can be added. Uniqueness of concepts has to be preserved. Homonyms and ambiguities should be avoided.

  • Classification:
    The system of order has to be built to be consistent, free of contradictions and transparent, following a classification that is scientifically or practically recognized. If one alone is not enough, multiple classifications have to be allowed.

Depending on the number of semantic axes, uniaxial and multiaxial systems are distinguished.

Definition 19 (Monoaxial or uniaxial system of order)

All concepts of interest are described with one axis.

The domain of discourse is ordered by continuously adding one distinguishing feature per hierarchical level from the general to the specific. In general, the classes can not be combined with each other [31].

Definition 20 (Multiaxial system of order)

To systematize concepts and classes, multiple axes are used. A multiaxial system of order is based on a category structure. Concepts of multiple categories or semantic axes are combined to express one complex concept. Every semantic axis corresponds to another area of information [31].

Following Definition 3, the concept hierarchies (Definition 1) are ordered with the relations “more specific” and “more general”. These concept hierarchies are also the input for the semantic indexing algorithm (Chapter 3.2).

4.3 Extension with multiaxial descriptions

In the example in Figure 13, the not yet mentioned “pain quality” is an aspect of the pain that can be surveyed independently from the localization. If, for example, it is known that the patient feels a strong piercing pain, the knowledge base does not yet have to know if it is located at the temples and vice versa. If this aspect is modeled in the tree in Figure 13, the subtree “pain quality” can be either recorded above the localization, so that each of these quality nodes has a subtree with the complete localization tree (not just each leaf node, because the detailing can be stopped early), or the other way around. That would make the tree less clear and harder to manage. This can be solved through defined levels, i.e. levels 1 to 5 specify the localization and levels 6 and 7 the quality. That makes it necessary to introduce additional variables . Furthermore, each aspect would need a defined depth, even when, for example at one position, certain aspects do not necessarily have to be described in such detail. This means that through a possible “forced hierarchization”, a knowledge base could get unnecessarily complex and make derivations harder. In that case it is better to represent different aspects in independent trees and link them to each other.

For each individual axis there exists a concept hierarchy that can be indexed. Every concept hierarchy can be given a unique name. Figure 13 shows two highly simplified concept hierarchies for pain quality and localization. Let now be the name of the concept hierarchy for quality and the name for localization.

Figure 13: Conjunctive multiaxial description with two axes.

To describe, for example, a piercing headache, both concept hierarchies can be combined conjunctively. For this, the appropriate node keys are stored for each axis. For the example in Figure 13, describes a piercing headache. The notation indicates that the axes and are assigned the node keys and respectively. This principle can be expanded by allowing several multiaxes. For example, if a patient has a piercing headache and an additional throbbing pain in his arm, this can be denoted as . Of course, other notations following the same principle are possible here.

More important than the specific notation is the fact that multiaxial descriptions can be easily realized with already indexed concept hierarchies. Compared to a uniaxial model, this has the benefit that multiple axes can be modeled independently from each other. Furthermore, a concept hierarchy that includes all axes would grow very fast and could no longer be managed efficiently. An additional benefit is that various axes can be combined as needed. To expand the above example, it would be possible to declare other axes (e.g. topography, development over time, etc.), so that the symptoms can be described more precisely.

5 Hierarchies of deduced concepts and d-concept descriptions

5.1 Hierarchies of d-concepts

Besides the domain-specific “atomic” concept hierarchies discussed until now, hierarchies of deduced concepts are also in use and have to be indexed. Together with the deduced concept descriptions, they serve the representation of generic knowledge and inference. To improve readability, deduced concepts will be abbreviated as d-concepts in the following chapters.

Definition 21 (Hierarchy of d-concepts / d-concept hierarchy)

A hierarchy of d-concepts is represented by a concept hierarchy (Definition 1). Nodes correspond to d-concepts. Every node has a unique concept key (Definition 12), a textual description and references (concept keys) to a set of subordinate concepts. If edges exist between d-concepts, the respective nodes have to be partially unifiable (Definition 11). The graph must not contain any cyclical definitions. Within the hierarchy of d-concepts, the relations between the individual d-concepts are “more general than” and “more specific than” (Definition 23).

Definition 22 (Deduced concept / d-concept)

Deduced concepts (d-concepts) are generic objects that refer to a set of instances. The instances that fulfill a d-concept are called the d-concept extension. Necessary and sufficient conditions for instances are declared by the d-concept description.

First, inferences with d-concepts will be considered. A given situation description consists of a tuple of node keys at a certain time.

A d-concept itself is a unary predicate over a base set : , , : belongs to , , : does not belong to

To describe relations in the domain of discourse, the d-concept knowledge base contains heuristic knowledge of the domain (domain knowledge) as well as generic knowledge.

Heuristic knowledge still contains the subjectivity of the creator and thus will not in all cases be sufficiently accepted. Generic knowledge represents important general relations and is of objective nature.

5.2 Inheritance of d-concepts

For hierarchies of d-concepts, their inheritance is important. A child d-concept inherits from a parent d-concept if it is a more specific version of

. For example, if the set of situation descriptions that are classified as

is a subset of the descriptions classified as , all instances that fulfill also fulfill . The relation from to is therefore called “more general than”.

Definition 23 (“More general than”, “more specific than”)

A d-concept is “more general than” a d-concept if , denoted as . The inverse relation is called “more specific than”.

The relation “more general than” is a pseudo-order. For example, it is transitive: if and then . As long as every d-concept has a unique name, the relation is also a partial order. It is, for example, antisymmetric: if and then .

5.3 Descriptions of d-concepts

To describe d-concepts, the respective concept keys of the domain are used to specify the defined d-concept as accurately as possible. Thus, an association between the attribute characteristics of the declared d-concepts is established. Cyclical definitions are not allowed.

Definition 24 (Deduced concept description / d-concept description)

Every d-concept is declared by its d-concept descriptions . The d-concept descriptions contain the necessary and sufficient conditions for instances.

The d-concept description implies:

  • For “atomic” concepts, the validity of the concept keys has to be determined by a partial unification with the node keys from the knowledge base or by querying the agent in the application.

  • For d-concepts, the validity of their concept keys has to be determined via inference regarding all necessary and sufficient conditions of the d-concept description.

  • Hierarchies of d-concepts use the “more specific than” and “more general than” relations. This also applies to the d-concept descriptions, in which subordinate descriptions are declared by specializing the superordinate descriptions and by inheritance.

  • The d-concepts are declared with concept keys based on domain knowledge. On the basis of inheritance relationships between the keys, the necessary and sufficient conditions with the assignment of the concept keys are done in a way where maximally general d-concept descriptions (Definition 25) appear on every hierarchical level and the relations “more general than/more specific than” (Definition 23) are not violated.

Definition 25 (Maximally general d-concept description)

A d-concept description is maximally general if it does not cover any negative instances (misclassification) and for all other d-concepts that also do not cover any negative instances.

6 Discussion of inference methods

With the uniform representation using node and concept keys, classic inference methods can be executed effectively (even with databases). Complex knowledge queries are possible. For these, generic and case-based knowledge can be used together. The obtained solutions are stored as instances in a database.

6.1 Logic-based inference

For the d-concept descriptions, e.g. clauses can be used. The representational paradigm matches (without proof) an expanded monadic predicate logic. Now, the known algorithms for logical inference can be used. From the introduced node and concept keys, index tables with the contained inheritance paths can be built. Combined with a partial matching, this can be translated into an efficient implementation of logic-based inference.

6.2 Concept-based reasoning

In general, the decidability and complexity of solutions depend on the logical language. Horn clauses allow a compact representation and are efficiently decidable. However, compared to the standard Horn resolution, the following problems appear:

  1. The user does not want to query every possible d-concept (e.g. diagnosis). Instead, the system should automatically find all valid d-concepts.

  2. Because the d-concepts of a terminology are structured hierarchically, for a d-concept, all more general d-concepts can be derived as well. This is defined with the relation “more general than” in Chapter 5.2. The expert is usually only interested in the most specific derivable diagnosis (d-concept).

Definition 26 (“Most specific”)

For a given , a d-concept is the most specific regarding if is valid and the set of all other valid d-concepts does not contain any for which .

For a situation description there can be more than one “most specific” concept.

To

  • find all valid d-concepts for the knowledge-based agent and

  • visualize the most specific of them to the expert

the following approach is chosen: The d-concepts in the hierarchy are attempted to be validated, starting at the root node and continuing with the children of all subsequent valid d-concepts. If for a d-concept no children can be derived, is the most specific and should be the solution [26].

6.3 Case-based reasoning

A classic method for using knowledge gained through experience in knowledge processing is case-based reasoning (CBR) [23].

The experiences with problem solving are stored for each case.

Definition 27 (Case)

A case is the description of a problem situation that has already happened in real life, together with the experience that was gained during the treatment of the problem. The knowledge in a case base consists of a problem description and the solution

as an ordered pair

. Additionally, a case can also contain explanations or an assessment of the results of the solution and thus be represented by the triple .

If a new problem has to be solved, a solution is searched based on previous experiences during the treatment of similar problems. Figure 14 illustrates the principle of reasoning.

Figure 14: Case-based reasoning.

One performance-critical aspect of CBR is the storage of resolved cases in a potentially very large case base and the efficient retrieval of cases similar to a requested case. In previous works, the case representation often was designed specifically for the retrieval process [23]. Analogous to the initial phase of database management systems (DBMS), data independence is also desirable for a case base.

The described representational paradigm uses concept hierarchies as well as instantiation and storage of cases as sequences of episodes that in turn consist of a set of term-instances. This is useful for a broad applicability of the similarity-based reasoning as well as a general infrastructure [26]. For cases with very small constraints of the available similarity measures, when just the axioms of the metric have to be fulfilled, [18] and [19] describe efficient methods for the consistent storage and indexing of cases for a similarity search.

7 Management with relational databases

A knowledge-based system that only defines a representation for concepts is not sufficient. It also has to be able to store assertional facts to infer conclusions from them. Managing the existing data and knowledge using relational database systems has considerable benefits. It is made possible based on the instantiation of the concept keys.

7.1 Data storage

A classic storage of the data in fixed data structures is not appropriate considering the potential amount of concepts, deduced concepts and instances. Notably, access to essential operations for inference also has to be supported.

The storage of instances should be as compact as possible and always represent the most specific node key. The data set stores all valid keys. In the normal case, the value of a key is just “true”. The information is therefore tied to the existence of the node key. The keys can also be used for access and references to other tables with additional information for values, texts, figures etc. Furthermore, the respective data set contains a unique composite primary key id including the time stamp (Chapter 7.2).

To specify instances and store them in databases there exist, for example, the following possibilities:

  1. Most specific node key

  2. Path to the most specific node key

These will be discussed. The structure of the storage is supposed to allow for more declarations of indexing structures, which make it possible to implement various inference operations more efficiently.

7.2 Representation of chronological events - episodes

Besides the representation of the terminology, concepts and instances, it is necessary to map instances such that they describe the momentary state, a current situation

of the modeled section of the world. For this purpose, episodes that depict instances of the terminological concepts and inferences at a specific moment are used. Episodes are stored in a relational database as well.

Definition 28 (Episode)

An episode describes a concrete event that happens at a certain point in time.

Each episode is specified by attributes. Every one of these attributes is defined by one or more instantiated semantic keys. Depending on the attribute, either exactly one or multiple forms can be assigned to an episode. With episodic knowledge, developments over time can be represented. In general, an episode is more closely defined through time (), content () and localization () [24].

7.3 Storage of instances

The storage of an instance should be as compact as possible and represent accessing the most specific known d-concept or concept key.

The derivation of more general d-concepts or concept keys should preferably happen in constant time. The structure should allow index structures that make it possible to efficiently find, for example, all instances that include a certain concept key.

7.3.1 Unique keys, efficiently determining predecessor nodes

Every node is given a unique key.

Only the key of constant length is stored, which makes this approach efficient storage-wise. The derivation of predecessor nodes is not trivial, but the index structure can be generated. For this, every node of a tree must store a reference to the parent node. This can easily be achieved with a hash table or dictionary. There, it can be computed in logarithmic time using parent pointers whether, for example, a d-concept is a predecessor of an instance.

The primary benefit of this version is that the maintenance of the knowledge base, including insertions, renaming (not the indices), deletions and moving of nodes is relatively unproblematic.

7.3.2 Identification of a node with its path

It is also possible to store traversed paths. For a valid instance, the node keys of the traversed path are stored, i.e. paths with indices. If necessary, the respective named instance nodes can be stored for reindexing or documentation purposes.

For example, efficient data structures can be constructed to identify all instances that include a concept, or for further inferences.

Problematic for this form of storage are changes to the knowledge structure. The reinterpretation or reindexing is algorithmically more complex, but has a unique solution. Because changing the knowledge base should be a rare and well thought out operation, the resulting computing time should be manageable.

7.4 Maintenance of knowledge bases

Regarding the maintenance of knowledge bases it should not be assumed that they are in their forever valid form before the establishment of the knowledge-based system. Their structure emerges because over time, experiences are collected with the knowledge base and structure as well as the knowledge inferred with it. Thus, new nodes have to be inserted and old ones renamed to update the hierarchy. Also, to keep the knowledge base manageable, it should be possible to remove nodes that proved to be irrelevant for the domain of discourse.

The basic operations and not only influence the generic knowledge but also the stored instance knowledge. In any case, such maintenance processes have to preserve the information of previously stored instances. They must be prevented from expiring by becoming completely indecipherable. A misinterpretation of old instances through the maintenance of generic knowledge structures would be even more problematic.

The operation removes nodes from the concept hierarchy. Through the removal of nodes, the indexing of all nodes clearly remains correct, that is, a reindexing is not necessary. In that regard, the operation is unproblematic. If a large amount of nodes is deleted, a reindexing can still be useful, because it could substantially simplify the keys.

The operation adds new nodes to the concept hierarchy. Here, two cases have to be distinguished.

Already existing concept

During the -operation, all concepts that are more general than the concept of the newly inserted node can be kept, but all more specific concepts have to be reindexed. Thus, is equivalent to the continuation of the indexing algorithm after all more general concepts have already been indexed. In Figure 15, the dashed node with the concept is introduced into the existing hierarchy. The concepts and are more general than and do not have to be reindexed. The concepts , and are more specific and have to be reindexed (together with itself).

Figure 15: Insert-operation.
New concept

A special case appears when one or more nodes with a new concept are inserted. Then, only these nodes have to be newly indexed. Intuitively, the complexity of the changes during the -Operation depends on how many concepts are affected.

These operations influence the stored instance knowledge differently depending on where they are relative to a node key. In any case, during the altering of a node (renaming or attaching to another parent node) by a knowledge engineer it has to be decided whether instances of this node will carry the same meaning after the change or if this is more akin to a deletion and reinsertion. The same decision has to be made for all child nodes.

8 Medical system as an example application

In collaboration with physicians from the interdisciplinary work group “Schmerzmedizin” (pain medicine) of the DGSS (German pain society), the knowledge-based system iSuite was developed. It is in use since 2000 and has been updated continuously. The system is an information and support system for the complex area of pain medicine. iSuite consists of several components that access a shared knowledge base which contains generic medical knowledge as well as a case base with real patient data. The complete knowledge is represented with a relational database system in a knowledge base.

The system provides assistance to the physician in the form of automated anamnesis dialogs, documentations, research, calculations, evaluations, illustrations and suggestions during the treatment of a patient. The system contains different components. These components access the knowledge represented in the database.

Knowledge-based agent

The system architecture as a knowledge-based agent is depicted in Figure 16.

Figure 16: Medical knowledge-based agent.

In the dialog, the agent deals with two groups of people: physicians/medical personnel and patients. Internally the agent tries to support the steps that are necessary for the medical treatment cycle. The knowledge used for this purpose is generally structured along two axes. First the knowledge can be mapped to one or more treatment steps, i.e. there exists knowledge about anamnesis, diagnosis, therapy and medication phase as well as connecting knowledge, e.g. which diagnoses were made because of a certain anamnesis. Along a second axis, instance knowledge, e.g. the anamnesis of a certain patient at a certain point in time, and generic knowledge, i.e. general knowledge about the anamnesis, are distinguished. For instance knowledge, an interaction with the patient is necessary, which happens through a patient dialog system for the gathering of anamnesis information. The dialog system enables the periodical surveying of information necessary for diagnosis and quality control. It also has the goal of minimizing the necessary workload. Information on tests, diagnoses, therapy evaluations as well as their results are supplied to the agent by personnel. All knowledge is stored in a shared generic knowledge representation (Chapter 2).

Knowledge base

The knowledge base for anamnesis, diagnosis, therapy and medication consists of:

  1. Concept terminology of the domain:
    The concept terminology consists of taxonomy graphs (trees) of the basic concepts and their specifications. The concept meaning is represented through the respective nodes and the position in the graph.

    • are lexical entries (nodes) for terminological concepts.

    • is the set of lexical entries for terminological concepts.

    • is the set of relations. The relations are not represented explicitly but implicitly with computed concept keys for each node in the graph. A relation exists if nodes are partially unifiable.

    • is a directed relation that describes the generalization and specialization relationships.

  1. Terminology of deduced concepts:
    Concept terminologies represent the deduced concepts in the domain. A taxonomy of the deduced concepts expands the semantic expression possibilities.

    • are lexical entries (nodes) for deduced concepts.

    • is the set of deduced concepts.

    • is the set of relations. The relations are not represented explicitly but implicitly with computed concept keys for each node in the graph. A relation exists if nodes are partially unifiable.

    • is a directed relation that describes the relationships “more general than” and “more specific than”.

  1. Deduced concept descriptions:
    Each deduced concept has a respective deduced concept description. These descriptions declare the necessary and sufficient conditions for deduced concept instances. Through inheritance, descriptions of superordinate deduced concepts can be passed on to subordinate deduced concepts. A “more general than” and “more specific than” relation is used. The deduced concept descriptions enable the inference process.

    • are subsets of lexical entries for concepts.

    • are subsets of lexical entries for references to concepts.

    • defines references between terminological concepts, deduced concepts and deduced concept descriptions.

  1. Semantic indexing:

    • The terminological concept hierarchy of the domain as well as the hierarchy of deduced concepts make up the basis of uniform representation and semantic indexing.

    • To avoid suboptimal forced hierarchization and enable precise expressions, a multiaxial representation should be used.

    • This concept hierarchy computes and assigns a unique concept key for each terminological concept and each deduced concept. This enables the representation of terminological relationships through partial unification.

    • At the same time, the concept index allows a quick search for the vocabulary of the contained concepts, their generalizations and specifications. The keys contain the semantically correct relations and inheritance relationships.

Abstracted structure of the knowledge base

Through the integration of concept- and case-based reasoning, specific and generic knowledge can be represented and processed uniformly. A strongly abstracted structure of the knowledge base using partialization (P), specialization (S) and similarity is shown in Figure 17. The partialization represents here a part-of relation, while the specialization represents an instantiation.

Figure 17: Abstracted structure of the knowledge base.
Dialog system

On the basis of the concept terminology and the semantic indexing, a dialog system was built which supports the physician in the treatment of the patient. The previously used representation alone is not sufficient for the dialog system because in practice additional specifications are necessary. The concept terminology anamnesis is therefore, inter alia, expanded with:

  • (question-)text as a comprehensible statement

  • alternative text for additional information, explanations, graphics, etc.

  • question types, e.g. single-selection, multi-selection

  • dealing with different forms of negation

  • closed-world or open-world semantics

  • optional, unconditional or default answers

  • possible additional notifications

  • etc.

The anamnesis representation has to be expanded with the respective specification, which generally can be achieved with little technical effort. The concept terminology of the anamnesis forms the basis for the dialog system. The terminology is processed with depth-first search and backtracking.

With the use of other algorithms, the dialog can be organized more efficiently and intelligently. It makes sense to only ask questions that are relevant for the current situation and to purposefully omit certain subtrees depending on the dialog progression and already existing knowledge.

Monotony and consistency conditions

The case knowledge base should always only contain episodes that are valid, both at the current time and independently from the current answers to the remaining questions. Additional episodes can extend the existing knowledge but not revise it. If entries are corrected, all dependent previous answers have to be revised. It also has to be guaranteed that the relation (more specific, more general) in Definition 3 is always valid, i.e. that no other valid nodes appear below a negated node.

For the knowledge base of a dialog system, for which possibly entries cannot always be interpreted unambiguously, additional generic knowledge that shows if consistency conditions have been violated can be provided.

Case-based reasoning

In addition to generic domain knowledge, the practical knowledge collected in the patient base is used in the form of case-based reasoning. During diagnosis and therapy recommendations for a patient, similar situations to the current case are searched for in the set of all patient cases. The recommendations given for similar previous cases are evaluated regarding their actual use as well as the resulting therapeutic success. From these experiences, recommendations for the current case can be won or discarded. During the therapy, success, side effects etc. are surveyed. After the treatment of the current problem, the case becomes available in the case base as well.

Case base, relational database system

During the realization of this system of CBR, multiple issues concerning demands from the case base have to be solved. All relevant attributes of a case have to be represented and stored in the case base. In particular, such a system is running over a long period of time and has a relatively large case base. Therefore a data-independent consistent storage analogous to quality demands of a (relational) database management system has to be possible. In particular, such a storage system should not be developed for one specific application. The design of the case representation should also be generally oriented and not driven by requirements of the storage system (data independence). In iSuite cases are stored as sequences of episodes which in turn consist of a set of node keys.

Similarity measure

In contrast to a classical relational system, similar cases regarding a similarity measure have to be efficiently found in the case base during the retrieval.

Because of the size of the case base, and because the computation of the similarity function can be very complex, the approach here has to be more efficient than the naive method of searching the case base linearly.

For the search for similar cases, it should first be clearly defined how similarity is measured in the context of the application. This is a task for the domain expert of the application. In particular, it is difficult to create a measure that corresponds to the medical similarity perception. Furthermore, similarities in the problem domain should mirror analogies in the solution space. For this reason, a similarity measure is usually altered multiple times during the development and use of a case-based system. For this problem, data independence is required as well.

Remarks

The idea behind the described paradigm is the equal treatment of episodic practical knowledge and general generic knowledge for an improved system performance. To achieve this goal, different methods have to be used:

  • For generic knowledge, generic description logic is very powerful and has high expressiveness. Accordingly, the computations are very cost-intensive.

  • The used calculus for application systems needs, on one hand, a high expressiveness, to be able to represent all necessary dependencies. On the other hand, a too generic logic should be avoided, because otherwise not enough guarantees can be given regarding the time and space complexity necessary for the computations.

  • The described representation was developed to be able to do quick computations during run-time, e.g. for many patients and thus many episodes. This allows a uniform representation, effective knowledge retrieval, inferences, CBR and other necessary methods.

  • Furthermore, the generic medical knowledge has increased vagueness, uncertainty and incompleteness. The main reasons for this are the complexity of medicine, individuality of patients and the necessity to differ from sharp descriptions. For “more complete”

    solutions to problems, apart from concept based reasoning , also Bayesian networks, which can process uncertain high-quality concepts, may be used for the generic knowledge processing. Here, expectation propagation is the method of choice

    [16]

    . For the determination of necessary a priori probabilities, distributions of classificatory characteristics can be obtained by the method

    [25].

9 Summary

The indexing algorithm assigns keys which contain the complete path from the general to the most specific knowledge. These semantic keys carry information and enable the representation of complex issues. The described approach allows a uniform representation that on one hand permits the knowledge-based modeling and representation of practical domains in classic relational databases and on the other hand ensures an efficient analysis of the represented data and knowledge base. It connects the two areas of knowledge representation and relational databases. As a result of the uniform representation, the structure, architecture, and implementation of knowledge bases can be improved. This is primarily done through the structured storage of knowledge, storing semantically clearly defined instances in a knowledge base, and the various possible inference methods. The approach is particularly suited for domains with a clear terminological structure.

Acknowledgment

We thank Mr. H.L. Biskupski for his help with the preparation of this manuscript.

References

  • [1] The University of Manchester Website - FaCT++ reasoner. http://owl.cs.manchester.ac.uk/tools/fact/. Accessed: 2019-07-26.
  • [2] W3C RDFDAML+OIL Website. https://www.w3.org/TR/daml+oil-reference, Dec. 2001. Accessed: 2019-07-26.
  • [3] Loom Project Website. https://www.isi.edu/isd/LOOM/, July 2007. Accessed: 2019-07-27.
  • [4] W3C RDF Schema Website. https://www.w3.org/TR/rdf-schema/, Feb. 2014. Accessed: 2019-07-26.
  • [5] W3C RDF Website. https://www.w3.org/RDF/, Feb. 2014. Accessed: 2019-07-29.
  • [6] CYC Homepage. https://www.cyc.com/documentation/ontologists-handbook/cyc-basics/syntax-cycl/, 2019. Accessed: 2019-07-26.
  • [7] Semafora Systems Website - Produkt OntoStudio. http://www.semafora-systems.com, Dec. 2019. Accessed: 2019-07-27.
  • [8] F. Baader. What’s new in Description Logics. Informatik-Spektrum, 34:434–442, 2011.
  • [9] F. Baader, D. Calvanese, D. L. McGuinness, D. Nardi, and P. F. Patel-Schneider, editors. The Description Logic Handbook - Theory, Implementation and Applications. Cambridge University Press, 2003.
  • [10] F. Baader and B. Hollunder. KRIS: Knowledge Representation and Inference System, System Description. ACM SIGART Bulletin, 2(3):8–14, June 1991.
  • [11] R. J. Brachman, V. Pigman, G. Hector, and J. Levesque. An essential hybrid reasoning system: knowledge and symbol level accounts of KRYPTON. In In Proceedings of the 9th International Joint Conference on Artificial Intelligence, pages 532–539. Morgan Kaufmann, 1985.
  • [12] R. J. Brachman and J. G. Schmolze. An overview of the KL-ONE Knowledge Representation System. Cognitive Science, 9(2):171–216, 1985.
  • [13] W. W. Cohen and H. Hirsh. Learning the CLASSIC Description Logic: Theoretical and Experimental Results. In In Principles of Knowledge Representation and Reasoning: Proceedings of the Fourth International Conference (KR94, pages 121–133. Morgan Kaufmann, 1994.
  • [14] O. Curé, W. Xu, H. Naacke, and P. Calvez. Extending LiteMat toward RDFS++. In LASCAR Workshop on Large Scale RDF Analytics, Protoroz, Slovenia, June 2019.
  • [15] D. Fensel, F. Harmelen, I. Horrocks, D. Mcguinness, and P. F. Patel-Schneider. OIL: An Ontology Infrastructure for the Semantic Web. Intelligent Systems, IEEE, 16:38–45, 04 2001.
  • [16] S. Flügge, S. Zimmer, and U. Petersohn. Knowledge representation and diagnostic inference using Bayesian networks in the medical discourse. CoRR, abs/1909.08549, 2019.
  • [17] T. R. Gruber. Ontolingua: A Mechanism to Support Portable Ontologies. Knowledge Systems Laboratory - Stanford University, 1992.
  • [18] S. Guhlemann, U. Petersohn, and K. Meyer-Wegener. Optimizing Similarity Search in the M-Tree. Datenbanksysteme für Business, Technologie und Web (BTW 2017), pages 485–504, March 2017.
  • [19] S. Guhlemann, U. Petersohn, and K. Meyer-Wegener. Reducing the Distance Calculations when Searching an M-Tree. Datenbank-Spektrum 17(2), pages 155–167, 2017.
  • [20] V. Haarslev and R. Möller. RACER System Description. In R. Goré, A. Leitsch, and T. Nipkow, editors,

    International Joint Conference on Automated Reasoning, IJCAR’2001, June 18-23, Siena, Italy

    , pages 701–705. Springer-Verlag, 2001.
  • [21] H. Jiang, H. Lu, W. Wang, and B. C. Ooi. XR-Tree: Indexing XML Data for Efficient Structural Joins. In In ICDE, pages 253–263, 2003.
  • [22] M. Kifer and G. Lausen. F-logic: A Higher-order Language for Reasoning About Objects, Inheritance, and Scheme. In Proceedings of the 1989 ACM SIGMOD International Conference on Management of Data, SIGMOD ’89, pages 134–146, New York, NY, USA, 1989. ACM.
  • [23] J. Kolodner. Case-Based Reasoning. Morgan Kaufmann, San Francisco, Calif, 1993.
  • [24] W. Oertel and U. Petersohn. Managing Episodes, Cases, Concepts, and Rules - an Integration Approach for Medical Problem Solving. In D. Aha and J. Daniels, editors, Case-Based Reasoning Integrations: Papers from the 1998 Workshop, Madison, Wisconsin, USA, Menlo Park, 1998. AAAI Press.
  • [25] U. Petersohn, T. Dedek, S. Zimmer, and H. Biskupski. Causal statistical modeling and calculation of distributions by classificatory features. arXiv 2019, in preparation.
  • [26] U. Petersohn, S. Guhlemann, L. Iwer, and R. Balzer. Problem solving with Concept- and Case-based Reasoning. In T. Burczynski, W. Cholewa, and W. Moczulski, editors, Proceedings of the Conference Developments in Artifical Intelligence Methods, pages 259–270, 2009. AI-METH Series on Artifcial Intelligence Methods, Gliwice, Poland.
  • [27] G. Schreiber, B. Wielinga, H. Akkermans, W. Van de Velde, and A. Anjewierden. CML: The commonKADS conceptual modelling language. In L. Steels, G. Schreiber, and W. Van de Velde, editors, A Future for Knowledge Acquisition, pages 1–25, Berlin, Heidelberg, 1994. Springer Berlin Heidelberg.
  • [28] S. Staab and R. Studer. Handbook on Ontologies. Springer Science & Business Media, Berlin Heidelberg, 2013.
  • [29] B. Swartout, R. Patil, K. Knight, and T. Russ. Ontosaurus: A Tool for Browsing and Editing Ontologies. http://ksi.cpsc.ucalgary.ca/KAW/KAW96/swartout/ontosaurus_demo.html, 2019. Accessed: 2019-07-27.
  • [30] S. Tobies. A PSpace-algorithm for ALCQI-satisfiability. LTCS-Report LTCS-99-09, LuFG Theoretical Computer Science, RWTH Aachen, Germany, 1999. See http://www-lti.informatik.rwth-aachen.de/Forschung/Papers.html.
  • [31] A. Zaiss, B. Graubner, F. Ingenerf, F. Leiner, U. Lochmann, M. Schopen, U. Schrader, and S. Schulz. Handbuch der medizinischen Informatik, chapter Medizinische Dokumentation, Terminologie und Linguistik, pages 89–144. Carl Hanser Verlag, München, Wien, 2., vollständig neu bearbeitete auflage edition, 2005.