Redundancy-free Verbalization of Individuals for Ontology Validation

We investigate the problem of verbalizing Web Ontology Language (OWL) axioms of domain ontologies in this paper. The existing approaches address the problem of fidelity of verbalized OWL texts to OWL semantics by exploring different ways of expressing the same OWL axiom in various linguistic forms. They also perform grouping and aggregating of the natural language (NL) sentences that are generated corresponding to each OWL statement into a comprehensible structure. However, no efforts have been taken to try out a semantic reduction at logical level to remove redundancies and repetitions, so that the reduced set of axioms can be used for generating a more meaningful and human-understandable (what we call redundancy-free) text. Our experiments show that, formal semantic reduction at logical level is very helpful to generate redundancy-free descriptions of ontology entities. In this paper, we particularly focus on generating descriptions of individuals of SHIQ based ontologies. The details of a case study are provided to support the usefulness of the redundancy-free NL descriptions of individuals, in knowledge validation application.

Authors

• 1 publication
• 7 publications
10/31/2016

Ontology Verbalization using Semantic-Refinement

We propose a rule-based technique to generate redundancy-free NL descrip...
12/08/2010

First steps in the logic-based assessment of post-composed phenotypic descriptions

In this paper we present a preliminary logic-based evaluation of the int...
07/06/2016

Towards Self-explanatory Ontology Visualization with Contextual Verbalization

Ontologies are one of the core foundations of the Semantic Web. To parti...
09/30/2020

OWL2Vec*: Embedding of OWL Ontologies

Semantic embedding of knowledge graphs has been widely studied and used ...
07/14/2021

The I-ADOPT Interoperability Framework for FAIRer data descriptions of biodiversity

Biodiversity, the variation within and between species and ecosystems, i...
04/24/2014

Generating Natural Language Descriptions from OWL Ontologies: the NaturalOWL System

We present NaturalOWL, a natural language generation system that produce...
12/19/2015

Test-Driven Development of ontologies (extended version)

Emerging ontology authoring methods to add knowledge to an ontology focu...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

Introduction

Description Logic based ontologies like Web Ontology Language (OWL) ontologies are structures which help in representing the knowledge of a domain, in the form of logical axioms; so that, an intelligent agent with the help of a reasoning system, can make use of them for several applications.

As an ontology evolves over a period of time, it can grow in size and complexity and unless the updates are carefully carried out, its quality might degrade. To prevent such quality depletion, usually an ontology development cycle is accompanied by a validation phase, where both the knowledge engineers and domain experts meet to review the status of the ontology.

In a typical validation phase, new axioms are included or existing axioms are altered or removed, to maintain the correctness of the ontology. Even though there are automated methods for generating new OWL axioms from a given knowledge source [Bühmann and Lehmann2013], the conventional method for incorporating new axioms and validating the ontology involves a validity check by domain experts. Domain experts, who do the validity check, cannot be expected to be highly knowledgeable on formal methods and notations. For their convenience, the OWL axioms will have to be first converted into corresponding natural language (NL) texts. Ontology verbalizers and ontology authoring tools such as ACE [Kaljurand and Fuchs2007], NaturalOWL [Androutsopoulos, Lampouras, and Galanis2014] and SWAT Tools [Third, Williams, and Power2011], can be utilized for generating controlled natural language (CNL) descriptions of OWL statements. But the verbatim fidelity of such descriptions to the underlying OWL statements, makes them a poor choice for ontology validation. The reason is that, the descriptions can be confusing to a person who is not familiar with formal constructs, and it is somewhat difficult to correctly make up the meaning from such descriptions. This issue had been previously reported in papers such as [Stevens et al.2011, Third, Williams, and Power2011], where the authors tried to overcome the issue by applying operations such as grouping and aggregation on the verbalized text. But, since the fidelity is being treated at the NL text level, the opportunity for a semantic reduction of the OWL statements to a more meaningful human-understandable representation is abstained.

For example, consider the following logical axioms (from People & Pets ontology111http://www.cs.man.ac.uk/horrocks/ISWC2003/Tutorial /people+pets.owl.rdf) shown in description logic (DL) and also in the Manchester OWL Syntax.

• Cat_Owner Person hasPet.Animal hasPet.Cat

• Cat_Owner(sam)

ObjectProperty:<uri#hasPet>
Class: <uri#Person>
Class: <uri#Animal>

Class: <uri#Cat_Owner>
SubClassOf:
<uri##hasPet>
some <uri#Animal>,
<uri#Person>

Individual: <uri#sam>
Types: <uri#Cat_Owner>


The existing tools generate different variants of the CNL texts like the following as results:

• A cat-owner is a person. A cat-owner has as pet an animal. A cat-owner has as pet a cat. Sam is a cat-owner.

or (with grouping and aggregation)

• A cat-owner is a person . A cat-owner is all of the following: something that has pet an animal, and something that has pet a cat; Examples: sam.

Even though these texts are not so close to the OWL statements, the usefulness of the description is hindered by the redundancies present in the text.

We propose a system which takes care of the required additional processing of restrictions, such that the redundant (portion of the) restrictions can be removed, to generate a more semantically comprehensible description. From an application point of view, in this paper, we particularly focus on generating textual description of individuals for validating ( based) ontologies.

Descriptions of individuals are currently generated by giving importance to technical correctness of the text, rather than their naturalness and fluency. For the previous example, we expect the system to produce a text similar to: Sam: is a cat-owner having at least one cat as pet; such that the redundant portion of the text has as pet an animal (since it is clear for an expert to imply it from having at least one cat as pet) can be removed.

In the empirical evaluation section, we seek to validate the following two propositions using a case study. Firstly, semantic level reduction of redundancies and repetitions can significantly improve the clarity of the domain knowledge descriptions. Secondly, NL descriptions of the individuals of an ontology is useful in validating a given knowledge source.

Related Work

Over the last decade, several CNLs such as Attempto Controlled English (ACE) [Kaljurand and Fuchs2007, Kaljurand2007], Ordnance Survey’s Rabbit (Rabbit) [Hart, Dolbear, and Goodwin2007], and Sydney OWL Syntax (SOS) [Cregan, Schwitter, and Meyer2007], have been specifically designed for ontology language OWL. All these languages are meant to make the interactions with formal ontological statements easier and faster for users who are unfamiliar with formal notations. Unlike the other languages [Hewlett et al.2005, Jarrar, Maria, and Dongilli2006, Androutsopoulos, Lampouras, and Galanis2014] that have been suggested to represent OWL in controlled English, these CNLs are designed to have formal language semantics and bidirectional mapping between NL fragments and OWL constructs. Even though these formal language semantics and bidirectional mapping are helpful in enabling a formal check that the resulting NL expressions are unambiguous, they generate a collection of unordered sentences that are difficult to comprehend.

To use these CNLs as a means for ontology authoring and for knowledge validation purposes, appropriate organization of the verbalized text is necessary. A detailed comparison of the systems that comprehend the NL texts is given in [Stevens et al.2011]. Among such systems, SWAT tools [Third, Williams, and Power2011]

are one of the recent and prominent tools which use standard techniques from computational linguistics to make the verbalized text more readable. They tried to give more clarity to the generated text by grouping, aggregation and elision. The Semantic Web Authoring (SWAT) tools have given much focus for comprehending the linguistic form of the sentences, rather than handling their logical forms, hence have deficiencies in their NL representations.

In this paper, we show that, by doing a entailment based reduction at the logical level, and then, by doing a NL mapping and enhancement over the reduced formalisms, a more meaningful human-understandable (what we call redundancy-free) representation can be obtained.

Preliminaries

SHIQ Ontologies

The description logic (DL)  is based on an extension of the well-known logic  [Schmidt-Schau  and Smolka1991], with added support for role hierarchies, inverse roles, transitive roles, and qualifying number restrictions [Horrocks, Sattler, and Tobies2000].

We assume and as countably infinite disjoint sets of atomic concepts and atomic roles respectively. A  role is either or an inverse role with . To avoid considering roles such as , we define a function Inv(.) which returns the inverse of a role: Inv( and Inv(

The set of concepts in  is recursively defined using the constructors in Table 2, where are concepts, are roles, and are positive integers. A  based ontology — denoted as a pair , where denotes terminological axioms (also known as TBox) and represents assertional axioms (also known as ABox) — is a set of axioms of the type specified in Table 2. A role in is transitive if Tran() or Tran() . Given an , let be the smallest transitive reflexive relation between roles such that implies and . For a  ontology , the role in every concept of the form and in , should be simple, that is, holds for no transitive role  [Baader et al.2003].

The semantics of  is defined using interpretations. An interpretation is a pair where is a non-empty set called the domain of the interpretation and is the interpretation function. The function assigns a set to every , and assigns a relation to every . The interpretation of the inverse role is . The interpretation is extended to concepts and axioms according to the rightmost column of Table 2 and Table 2 respectively, where denotes the cardinality of the set .

We write , if the interpretation satisfies the axiom (or is true in ). is a model of an ontology (written ) if satisfies every axiom in . If we say is entailed by , or is a logical consequence of (written ), then every model of satisfies . A concept is subsumed by w.r.t. if , and is unsatisfiable w.r.t. if . Classification is the task of computing all subsumptions between atomic concepts such that and ; similarly, property classification of  is the computation of all subsumptions between properties such that and .

Semantics of newly introduced DL constructs

In the description generation section, we introduce two new DL constructs, to represent reduced forms of some of the existing logical forms; this subsection describes their semantics.

For a concept and a role , the semantics of is defined using the interpretation as { }

From now on, we address the restrictions of the form as non-vacuous universal restrictions. By the semantics of and , at least one property is guaranteed for the instance , therefore, both these restrictions are addressed as non-vacuous restrictions in general.

The semantics of is defined as { x }

Running Example

We use the ontology given in Table 3 (extracted from HarryPotter-book ontology) as the running example throughout this paper. We address this ontology as HP ontology in this paper.

Description Generation

As we mentioned before, we focus on generating descriptions for each of the individuals in a given  based ontology. To generate the description of an individual, we associate with it a set that contains the constraints it satisfies as per the ontology. We call these sets as the description-sets of the individuals.

The description-set (DS) of an instance (represented as ()) in the ontology  is defined as follows (where and are a concept name and a role name respectively in , and and are positive integers.)

In [E.V. and P.2015], the authors have introduced a method for generating DS of individuals — they call the sets as “node-label-sets” — from a given OWL ontology, using simple SPARQL queries and a reasoner. They were generating the description-sets for a different motive — generating stems of multiple choice questions.

A DS may contain redundant and repetitive knowledge, which needs to be removed before verbalizing, to improve the readability as well as the clarity of the content. By redundant knowledge we mean those restrictions which are implied by a strict restriction, or those restrictions which can be combined with other restrictions to form a more human-understandable form. For example, consider the DS of the individual harrypotter, from our running example.

 D(harrypotter) = { HogwartsStudent, Student, Human, Wizard, HalfBlood, Gryffindor, ∃hasPet.Pet, ∃hasPet.Owl, ∀hasPet.Creature, ≤1hasPet.Creature }

From the DS, consider the subset { HogwartsStudent, Student, Human }. Since HogwartsStudent Student and Student Human, the set { Student, Human } is a redundant knowledge, and can be removed while generating a description. The reason for considering the set as redundant knowledge is that, being a Hogwarts student clearly implies that Harry Potter is a student and a human. Similarly, if and appear together (given ), then can be be considered as a redundant knowledge.

In order to remove such redundancies and repetitions, we propose 7 sets of entailment based rules that can be applied on the restrictions in a DS. The description-sets after applying all the possible rules in the 7 Rule-sets are called Redundancy-free description-set (represented as, ).

The rules in the 7 Rule-sets are applied in order from Rule-set 1 to Rule-set 7. The rule-sets and the corresponding rules should be taken in order for reduction; this is because, each rule-set contains carefully chosen restriction patterns whose resulting patterns can be used for further reduction in the forthcoming rule-sets. Moving from a lower rule-set to a higher rule-set, the restrictions which have been applied by a rule can be removed from the DS — this will greatly reduced the number of combinations of restrictions that are to be considered for applying the rules in the imminent rule-sets. Considering the size limitation of the paper, we refrain from explaining the proof of correctness of the reduction rules here.

1. Most-specific concept selection rule

• For each class name , if there exists a , s.t. , then add to and , and remove from , if present.

2. Existential class-restrictions’ rule

• For each , if there exists a , s.t. , then add to and , and remove from , if present.

3. Universal class-restrictions’ rules

• For each , if there exists a , s.t. , then add to and , and remove from , if present.

• For each , if there exists a , s.t. , then add to and , and remove from , if present.

• For each , if there exists a , s.t. , then add and to and .

4. I-II Combination rules

• For each , if there exists a , s.t. , then add to and , and remove and from , if present.

• For each , if there exists a , s.t. , then add and to and , and remove from , if present.

• For each , if there exists a , s.t. , then add and to and , and remove and from , if present.

• For each , if there exists a , s.t. , then add and to and .

5. Cardinality class-restrictions’ rules

• For each , if there exists a , s.t. where , then add to and , and remove from , if present.

• For each , if there exists a , s.t. where , then add to and , and remove and from , if present.

• For each , if there exists a , s.t. where , then add to and , and remove from , if present.

• For each , if there exists a , s.t. where , then add to and , and remove and from , if present.

6. Non-vacuous444refer the preliminaries section class-restrictions’ rules

• For each , if there exists a , s.t. , then add to and , and remove from , if present.

• For each , if there exists a , s.t. , then add to and , and remove from , if present.

• For each , if there exists a , s.t. , then add to and , and remove from , if present.

• For each , if there exists a , s.t. , then add to and , and remove from , if present.

7. Exactly-one class-restrictions’ rules

• For each , if there exists a , s.t. , then add and to and , and remove from , if present.

• For each , if there exists a , s.t. , then add and to and , and remove from , if present.

Figure 1 shows the reduction steps of (harrypotter). The constraints in the DS are taken two at a time and we consider the possible applications of the rules in the Rule-sets 1 to 7. At first, the rule in Rule-set 1 (denoted as, Rule-1a) is applied repeatedly to obtain the most specific class name. Then, the rule in Rule-set 2 (Rule-2a), the second rule in the Rule-set 4 (Rule-4b), the second rule in Rule-set 5 (Rule-5b), and finally the fourth rule in Rule-set 6 (Rule-6d), are applied in order to reduce the property related restrictions in the DS.

Linguistic Description of Individuals

For the completeness of the paper, we present a simple method which we adopted to generate linguistic descriptions of individuals from their redundancy-free DS.

Linguistic description of an individual is defined as the set of NL fragments which describes the class names and property related constraints it satisfies. An example of a description of Harry Potter (individual harrypotter) from HP ontology is given as:

 Harry Potter: is a Hogwarts Student, a Wizard, a Halfblood, a Gryffindor and having exactly one Owl as Pet

We consider a template similar to the following regular expression (abbreviated as regex) for generating an individual’s description.

 Individual: (“is”) ((“a”) ClassName (“,” | “and”)?)+ ((PropertyRestriction)(“,” | “and”)?)+

In the above regex, ClassName specifies the concept names in the DS. We use the rdfs:label property values of the class names as the ClassName. If rdfs:label property is not available, the local names of the URIs are used as the ClassName. For PropertyRestriction, the property related class restrictions in the DS are utilized. The property related constraints are treated in parts. We first tokenize the property names in the constraints. Tokenizing includes word-segmentation and processing of camel-case, underscores, spaces, punctuations etc. Then, we identify and tag the verbs555In the absence of a proper verb, the phrase “related to” is used in its place. and nouns in the segmented phase — as R-verb, R-noun respectively — using the Natural Language Tool Kit666Python NLTK: http://www.nltk.org/. Some of these R-verb are given pre-defined morphological word forms. For example, the verb ‘has’ will be changed to ‘having’. We then incorporate these segmented words in a constraint-specific template, to form a PropertyRestriction. For instance, the restriction hasPet.Cat is verbalized to “having at least one pet as cat”, using the template: at least one as . Constraint-specific templates corresponding to the possible restrictions in a DS are listed in Table-5. Linguistic variations of these constraint-specific templates are also possible, to enhance the readability. But, since the empirical study (see the next section) is done for a different intention with the help of a carefully chosen participants, we limit further fluency enhancement of the texts.

We avoid the restrictions that contain (apex class) or (bottom class), for generating the description; this is purely a design decision. The inclusion of such restrictions may force to consider new cases of the constraint-specific templates, in addition to what is given in Table-5, which we are not right now interested in.

Empirical Evaluation

We present a case study to explore the applicability of the redundancy-free description of instances in validating the domain knowledge. Rather than choosing an ontology under development, we study the case of validating a previously built ontology. Plant-protection ontology (a.k.a. PP ontology), an ontology which had been used for the empirical study in [E.V. and P.2015], is chosen for our study.

In the study, domain experts were presented with two representations of the same knowledge: one is by direct verbalization of the description-sets and the other is by verbalizing them after finding the corresponding redundancy-free description-sets. Direct verbalization of a DS generates texts (or descriptions) which are similar to those texts which are produced by an existing ontology verbalizer — we call this method as traditional approach, and the other as the proposed approach. Examples for the description texts that are generated using the proposed approach and traditional approach, from the PP ontology, HP ontology and Geographical Entity888https://bitbucket.org/uamsdbmi/geographical-entity-ontology/src (last accessed: 27/11/2015) (GEO) ontologies are given in Table 4.

The experts were then asked to mark their degree of understanding of the knowledge in the scale: (1) poor; (2) medium; (3) Good.

To measure the usefulness of our approach in validating the domain knowledge; corresponding to each of the instance-descriptions, domain experts are told to choose from the options: (1) Valid (2) Invalid (3) Don’t know (4) Cannot be determined. Also, feedback is being taken to get suggestions on improving the system.

PP-ontology has 546 instances, 105 concepts and 15 object properties. Corresponding to each of the 546 instances, we have generated 546 description texts, using an implemented prototype of the system. Since manual evaluation of all the generated descriptions is difficult, we grouped the instances based on their (redundancy-free) description-set, and aggregated the names of the instances using suitable conjunctions (for e.g., Yellow rust and Brown rust). An ontology description, containing 31 sentences, has been obtained for evaluation. Three experts on plant protection related area reviewed the verbalized descriptions.

Results

Does it improve the understandability?

Degree of understanding of each of these descriptions to a domain expert, is identified by looking at the options (poor, medium or good) which she had chosen. If there exists an ambiguity in the description (due to its verbatim fidelity to OWL statements), she is expected to choose poor or medium level as the understanding. To confine the reasons for ambiguity to the fidelity to OWL alone, possible (manual) editing had been done on the generated text — as we are not using any sophisticated NL generation techniques.

Table 6 shows the statistics of the responses which we received for the descriptions that are generated using our approach, from three domain experts. Overall response (fourth row) in the table corresponds to the response of the majority (at least 2 out of 3). For those descriptions which are generated from redundancy-free DS, 24 out of 31 texts are rated as “good”, whereas for those which are generated directly from DS, only 5 out of 31 texts are rated as “good” (see Table 7). This highlights the significance of the redundancy reduction process, in domain knowledge understanding.

How helpful is it in knowledge validation?

Usefulness of the generated descriptions in validating an ontology, can be obtained by looking at the number of description texts which are marked as “Cannot be determined”. The three options: Valid, Invalid and Don’t know, imply that the text is useful in getting into a conclusion, whereas the option “Cannot be determined” indicates some problem in the representation. In Table 8, we show the count of the description texts which have been marked as “Cannot be determined” by the majority of the domain experts, for the traditional and proposed approaches. In case of the proposed approach, only 2 out of 31 descriptions are not useful in determining the quality of the ontology, whereas in case of the traditional approach, approximately 50 percentage of the descriptions are not helpful.

Domain Experts feedback and discussion

The participants agree with the fact that, by reducing the redundancies in a description, the amount of time required for validating an instance description is reduced to a great extent.

Validation of an ontology also involves verifying the truthfulness of the property relationships in it, which is not addressed in this paper. This issue can be addressed in future by generating description-sets for pairs of instances, and mapping them to the respective constraint(s) in the DS of the first instance. For e.g., , and , then in can be mapped to in . The description of can be generated as : is a and , and having some , like , as Friend.”

According to the domain experts, a persisting problem with any validation phase (especially when it involves instance-wise description generation and experts validating the verbalized knowledge) is that, when the ontology becomes very large and complex, validation phase becomes a bottleneck for the entire development cycle. One way to overcome this issue in our validation approach is by considering only a relevant subset of instances and their descriptions, so that, a rough estimate of the erroneous formalisms in the ontology can be identified instantly.

Conclusion

A novel method for generating text descriptions of individuals of a given  ontology is proposed in the paper. The descriptions are not verbatim translations of logical axioms of the ontology. Rather, they are generated from a description of the individual on which semantic simplification has been carried out. We propose entailment-based reduction rules for this purpose. We find that the proposed method indeed gives redundancy-free descriptions of individuals.

Empirical studies based on a rather small ontology show that the redundancy-free description of the domain knowledge is helpful in understanding the formalized knowledge more effectively and also useful in validating them.

As a future work, we plan to implement a Protege plug-in to allow ontology developers to benefit from the suggested approach.

References

• [Androutsopoulos, Lampouras, and Galanis2014] Androutsopoulos, I.; Lampouras, G.; and Galanis, D. 2014. Generating natural language descriptions from OWL ontologies: the naturalowl system. CoRR abs/1405.6164.
• [Baader et al.2003] Baader, F.; Calvanese, D.; McGuinness, D. L.; Nardi, D.; and Patel-Schneider, P. F., eds. 2003. The description logic handbook: theory, implementation, and applications. New York, NY, USA: Cambridge University Press.
• [Bühmann and Lehmann2013] Bühmann, L., and Lehmann, J. 2013. Pattern based knowledge base enrichment. In Alani, H.; Kagal, L.; Fokoue, A.; Groth, P.; Biemann, C.; Parreira, J.; Aroyo, L.; Noy, N.; Welty, C.; and Janowicz, K., eds., The Semantic Web – ISWC 2013, volume 8218 of Lecture Notes in Computer Science. Springer Berlin Heidelberg. 33–48.
• [Cregan, Schwitter, and Meyer2007] Cregan, A.; Schwitter, R.; and Meyer, T. 2007. Sydney owl syntax - towards a controlled natural language syntax for owl 1.1. In Golbreich, C.; Kalyanpur, A.; and Parsia, B., eds., OWLED, volume 258 of CEUR Workshop Proceedings. CEUR-WS.org.
• [E.V. and P.2015] E.V., V., and P., S. K. 2015. A novel approach to generate MCQs from domain ontology: Considering DL semantics and open-world assumption. Web Semantics: Science, Services and Agents on the World Wide Web 34:40 – 54.
• [Hart, Dolbear, and Goodwin2007] Hart, G.; Dolbear, C.; and Goodwin, J. 2007. Lege feliciter: Using structured english to represent a topographic hydrology ontology. In Golbreich, C.; Kalyanpur, A.; and Parsia, B., eds., OWLED, volume 258 of CEUR Workshop Proceedings. CEUR-WS.org.
• [Hewlett et al.2005] Hewlett, D.; Kalyanpur, A.; Kolovski, V.; and Halaschek-wiener, C. 2005. Effective nl paraphrasing of ontologies on the semantic web. In End User Semantic Web Interaction Workshop (ISWC 2015).
• [Horrocks, Sattler, and Tobies2000] Horrocks, I.; Sattler, U.; and Tobies, S. 2000. Reasoning with individuals for the description logic shiq. CoRR cs.LO/0005017.
• [Jarrar, Maria, and Dongilli2006] Jarrar, M.; Maria, C.; and Dongilli, K. P. 2006. Multilingual verbalization of orm conceptual models and axiomatized ontologies. Technical report.
• [Kaljurand and Fuchs2007] Kaljurand, K., and Fuchs, N. E. 2007. Verbalizing owl in attempto controlled english. In OWLED, volume 258.
• [Kaljurand2007] Kaljurand, K. 2007. Attempto Controlled English as a Semantic Web Language. Ph.D. Dissertation, Faculty of Mathematics and Computer Science, University of Tartu.
• [Schmidt-Schau  and Smolka1991] Schmidt-Schau , M., and Smolka, G. 1991. Attributive concept descriptions with complements. Artificial Intelligence 48(1):1–26.
• [Stevens et al.2011] Stevens, R.; Malone, J.; Williams, S.; Power, R.; and Third, A. 2011. Automating generation of textual class definitions from owl to english. J. Biomedical Semantics 2(S-2):S5.
• [Third, Williams, and Power2011] Third, A.; Williams, S.; and Power, R. 2011. Owl to english : a tool for generating organised easily-navigated hypertexts from ontologies.