Natural language lexical definitions of terms can be used as a source of knowledge in a number of semantic tasks, such as Question Answering, Information Extraction and Text Entailment. While formal, structured resources such as ontologies are still scarce and usually target a very specific domain, a large number of linguistic resources gathering dictionary definitions is available not only for particular domains, but also addressing wide-coverage commonsense knowledge.
However, in order to make the most of those resources, it is necessary to capture the semantic shape of natural language definitions and structure them in a way that favors both the information extraction process and the subsequent information retrieval, allowing the effective construction of semantic models from these data sources while keeping the resulting model easily searchable and interpretable. Furthermore, by using these models, systems can increase their own interpretability, benefiting from the structured data for performing traceable reasoning and generating explanations – features which are becoming even more valuable given the growing importance of Explainable AI [Gunning2017].
In this work, we propose a method for automatically building commonsense knowledge bases out of natural language dictionary definitions, which is easily extensible to any domain where natural language glossaries are available. Building upon a conceptual model based on a set of semantic roles for definitions, we classify each segment in a definition according to its relation to the entity being defined, and convert the classified data into a knowledge graph where each node is a meaningful phrase which contains a piece of self-contained information about the definiendum. Following this methodology, we processed the whole noun and verb databases of WordNet [Fellbaum1998] and built the WordNetGraph, and then used this knowledge graph to recognize text entailments in an interpretable way, providing concise justifications for the entailment decisions.
2 Related Work
The construction of structured databases from dictionary definitions has been largely explored, and most approaches rely on syntactic parsers for the identification of patterns that point to relationships between words [Calzolari1991, Vossen1991, Vossen1992, Vossen and Copestake1994]. Among early efforts, it is remarkable the creation of LKB, a Lexical Knowledge Base [Copestake1991] based on typed-feature structures that can be seen as a set of attributes for a given concept, such as “origin”, “color”, “smell”, “taste” and “temperature” for the concept drink, for example. The definitions from a machine-readable dictionary are parsed to extract the definiendum’s genus and differentiae, and the values represented by the differentiae will fill in the feature structures for that genus. Since the features, that is, the relevant attributes for a given entity, must be defined in advance, only a restricted domain was considered in their approach.
Dolan et al. dolan1993automatically also describe an automated strategy to build a structured lexical knowledge base but, instead of the entity-attributes structure, they use syntactic parsing to identify semantic relations such as is-a, part-of, etc., to build a directed graph. Recski recski2016building also derives a graph representation from dictionary definitions, but in the adopted conceptual model there are only three types of edges, numbered from 0 to 2: the 0-edge represents unary predicates and the 1 and 2-edges connects binary predicates to their arguments. In common, most approaches work at the word-level, converting each single word in the definition into a different attribute or node. In the graph knowledge base scenario, this can increase the information retrieval complexity, given that it may be necessary to concatenate the contents of several nodes to obtain meaningful enough information about an entity.
The work proposed by [Bovi et al.2015] go beyond the word-level representation, being able to identify multi-word expressions. They perform a syntactic-semantic analysis of textual definitions for Open Information Extraction (OIE). Although they generate a syntactic-semantic graph representation of the definitions, the resulting graphs are used only as an intermediary resource for the final goal of extracting semantic relations between the entities present in the definition.
3 Graph Conceptual Model
To build the definition graph, we adopted the conceptual model proposed by Silva et al. silva2016categorization. This model extends the genus-differentia definition pattern from Aristotle’s classic theory of definition [Berg1982, Lloyd1962, Granger1984] by defining a set of entity-centered semantic roles for lexical definitions. Differently from the commonly used event-centered semantic roles, which define the semantic relations holding among a predicate (the main verb in a clause) and its associated participants and properties [Màrquez et al.2008], definition’s semantic roles express the part played by an expression in a definition, showing how it relates to the definiendum, that is, the entity being defined.
In this model, the genus concept was replaced by the more general role supertype, which can be not only the definiendum’s immediate superclass but also an ancestor higher in the concepts hierarchy. The differentia component was split into two roles: differentia quality and differentia event. These three roles can be seen as the representatives of an entity’s essential properties, while other roles, such as associated fact, purpose or accessory quality, for example, define non-essential properties. The conceptual model is depicted in Figure 1, and Table 1 presents a summarized description for each of the roles defined in this model.
|Supertype||the immediate or ancestral entity’s superclass|
|Differentia quality||a quality that distinguishes the entity from the others under the same supertype|
|Differentia event||an event (action, state or process) in which the entity participates and that is mandatory to distinguish it from the others under the same supertype|
|Event location||the location of a differentia event|
|Event time||the time in which a differentia event happens|
|Origin location||the entity’s location of origin|
|Quality modifier||degree, frequency or manner modifiers that constrain a differentia quality|
|Purpose||the main goal of the entity’s existence or occurrence|
|Associated fact||a fact whose occurrence is/was linked to the entity’s existence or occurrence|
|Accessory determiner||a determiner expression that doesn’t constrain the supertype-differentia scope|
|Accessory quality||a quality that is not essential to characterize the entity|
|[Role] particle||a particle, such as a phrasal verb complement, non-contiguous to the other role components|
This set of semantic roles captures the semantic “shape” of natural language definitions and allows the extraction of structured representations from linguistic resources, enabling them to be used as knowledge sources in a wide range of semantic tasks.
4 Construction Methodology
Structuring natural language definitions as a graph allows us to select the portions of information regarding an entity’s description that are relevant for a certain reasoning task. For example, consider the definition (from WordNet) for the concept “lake poets”, which was classified according to the model described in Section 3, illustrated in Figure 2. When retrieving data related to this concept, we could be interested only in origin- (lake poets are English poets), time- (lake poets are poets at the beginning of the 19th century) or space- (lake poets are poets who lived in the Lake District) related information. When each of those roles is represented as a node in a graph we can focus only on the path containing the nodes of interest. Moreover, since the definition is split into segments rather than single words, each node contains a comprehensible amount of information, avoiding the need to visit several nodes to gather intelligible phrases.
To generate the WordNetGraph111https://github.com/Lambda-3/WordnetGraph – a knowledge graph following the RDF data model – from WordNet’s noun and verb glosses, we adopted the following methodology for classifying and structuring the definitions:
Synsets sample selection:
in order to use a supervised machine learning model to classify the data, we needed a initial set of annotated definitions. To build this set, we randomly selected 2,000 WordNet synsets, being 1,732 noun synsets and 268 verb synsets (the verb database size is around 17% of the noun database size).
the set of 2,000 definitions was automatically pre-annotated according to a rule-based heuristic that takes into account the syntactic patterns identified by statistical analysis as described by Silva et al. silva2016categorization. Using the Stanford parser[Manning et al.2014], we generated the syntactic parse tree for each definition, identified the relevant phrasal nodes and then assigned the semantic roles more often associated to them. For example: the supertype for a noun definition is usually the innermost and leftmost noun phrase (NP) that contains at least one noun (NN); a differentia event is usually either a subordinate clause (SBAR) or a verb phrase (VP); an event location is normally a prepositional phrase (PP) inside a SBAR or VP and possibly containing a location named entity, and so on. Figure 3 shows the parse tree generated for the definition of the term “Scotch” – whiskey distilled in Scotland – and the semantic roles automatically assigned to each phrasal node.
Data curation: after the automatic pre-annotation, the definitions were manually curated with the aid of the Brat222http://brat.nlplab.org/ annotation tool. Misclassifications were fixed and segments missing a role were assigned the appropriate one. Misclassifications and missing roles are due to parser errors or insufficient information (for instance, a PP inside a VP may not contain any named entity, making it hard to correctly distinguish between an event time and an event location). The manual data curation ensured that every segment in each definition, apart of leading determiners and conjunctions between roles (as opposed to conjunctions inside roles), was associated with a semantic role label.
the curated data was then used to train a Recurrent Neural Network (RNN) machine learning model designed for sequence labeling. We used the RNN implementation provided by Mesnil et al. mesnil2015using, which reports state-of-the-art results for the slot filling task. The dataset was split into training (68%), validation (17%) and test (15%) sets. The best accuracy reached during training was of 80.35%.
Database classification: the trained classifier was then used to label all WordNet’s noun and verb definitions. For simplicity, example sentences and parentheses were excluded from the original glosses. The classification was performed over WordNet 3.0; 82,112 noun definitions and 13,761 verb definitions were labeled.
Data post-processing: since some of the classified definitions lacked the supertype role, the labeled data had to pass through a post-processing phase. The supertype is a mandatory component in a well-formed definition and, as will be detailed later, the RDF model is structured around it. Following the same syntactic rules adopted for pre-annotation, missing supertypes were identified and the roles around it had its limits adjusted, while the remaining classification was kept unchanged. Figure 4 shows an example of definition (for the term “spur”) fixed in the post-processing phase.
RDF conversion: finally, the labeled definitions were serialized in RDF format. In the final graph, a synset is a node and each role in its definition is another node. The synset node is linked to the supertype role, which is, in turn, linked to all the other roles. More specifically, a supertype linked to a role is a reified node, and this reified node is linked to the synset node. Reification is also used when a role has components, such as event time and/or location for a differentia event and quality modifier for a differentia quality. In this case, the component is linked to its main role, composing a reified node which is linked to the supertype, creating another reified node which is eventually linked to the synset node. This structure allows the relationships to be fully contextualized. As an example, consider the definition depicted in Figure 2. The node defined by the concept “poet” may be linked to several other nodes in the graph, but it is linked to the differentia quality node “English” only in the context of this definition. Supertype nodes are always represented as resources. The differentia quality and differentia event nodes can be represented as either resources, when they have components (event times and/or locations, or quality modifiers) to be linked to, or literals otherwise. All the other roles are represented as literals, and properties are named after role names333Complete list of the model’s properties and namespaces at https://github.com/Lambda-3/WordnetGraph. Figure 5 shows the simplified (without reification) RDF representation for the definition in Figure 2.
Besides WordNetGraph, which is available in both XML and N-Triples format, we provide a set of tools444https://github.com/ssvivian/DefRelExtractor that implement the methodology described above. Routines for pre-processing definitions to generate sample data for manual curation, post-processing data returned by a machine learning classifier, and generating the RDF model from the classified data are freely available, along with some auxiliary routines to prepare the data for external tools, such as converting to the standoff file format required by the Brat annotation tool and generating a python script that will create the dataset for the RNN classifier.
WordNetGraph is one of the main components in a text entailment recognition approach aimed at justifying entailment decisions where reasoning over world knowledge is required. Text entailment is defined as a directional relationship between an entailing text T and a entailed hypothesis H, holding true whenever a human reading T would infer that H is most likely true [Dagan et al.2006]. Using WordNetGraph as the world knowledge base, we implemented a navigation algorithm based on distributional semantics [Freitas et al.2014] to find a path in this graph linking T to H, and used the contents of the nodes in this path to build a human-readable justification for the entailment decision. The entailment is rejected if no path is found.
Consider, as an example the entailment pair 39.3 from the BPI dataset555http://www.cs.utexas.edu/users/pclark/bpi-test-suite/:
39.3 T: Many cellphones have built-in digital cameras.
39.3 H: Many cellphones can take pictures.
First, we look for pairs of terms that have a strong semantic relationship and that can prove this entailment to be true, and then send these pairs as input for the graph navigation algorithm. In this example, the best pair is composed by the terms “digital camera”, which is our source, and “pictures”, our target. Starting from the source, we retrieve all the nodes in WordNetGraph linked to it, compute the semantic similarity between each node and the target and choose the one having the highest value as the next node to be visited, and do this recursively until we reach the target. The following segments (triples) are found by the navigation algorithm:
digital camera has_supertype camera
camera has_supertype equipment
equipment has_diff_qual for taking photographs
Since “photograph” and “picture” are in the same synset node, the search stops at this point, confirming the entailment and producing the following justification, built from the path segments:
A digital camera is a kind of camera
A camera is an equipment for taking photographs
Photograph is synonym of picture
Experiments with the BPI dataset and a sample of the Guardian Headlines dataset666https://goo.gl/4iHdbX show the results are comparable to those of well-established text entailment algorithms, such as tree edit-distance based [Kouylekov and Magnini2005] and classification based [Wang and Neumann2008], while providing clear human-like explanations, an important feature still missing in most text entailment recognition approaches. A detailed description of the entailment recognition application, including experiment results and further justification examples can be found in [Silva et al.2018].
We presented a method for automatically building a graph world knowledge base from natural language dictionary definitions. Adopting a conceptual model based on entity-centered semantic roles, we trained a supervised machine learning classifier for automatic role labeling and then converted the labeled data into an RDF graph representation. Following this methodology, we created the WordNetGraph, a graph built from the definitions of nouns and verbs in WordNet. A set of tools implementing the methodology is also freely available.
WordNetGraph was successfully used in a text entailment recognition approach based on distributional navigation over definition graphs. Besides using paths in this graph to recognize the entailment, this approach also provides a human-readable justification for the entailment decision. Since each graph node encloses a self-contained amount of information rather than always representing single words, an intelligible justification can be built from a path made up by only a few nodes. As future work, we intend to apply this methodology to other language resources, such as Wiktionary.
Vivian S. Silva is a CNPq Fellow – Brazil.
8 Bibliographical References
- [Berg1982] Berg, J. (1982). Aristotle’s theory of definition. ATTI del Convegno Internazionale di Storia della Logica, pages 19–30.
- [Bovi et al.2015] Bovi, C. D., Telesca, L., and Navigli, R. (2015). Large-scale information extraction from textual definitions through deep syntactic and semantic analysis. Transactions of the Association for Computational Linguistics, 3:529–543.
Acquiring and representing semantic information in a lexical
Workshop of SIGLEX (Special Interest Group within ACL on the Lexicon), pages 235–243. Springer.
- [Copestake1991] Copestake, A. (1991). The LKB: a system for representing lexical information extracted from machine-readable dictionaries. In Proceedings of the ACQUILEX Workshop on Default Inheritance in the Lexicon, Cambridge.
- [Dagan et al.2006] Dagan, I., Glickman, O., and Magnini, B. (2006). The pascal recognising textual entailment challenge. In Machine learning challenges: evaluating predictive uncertainty, visual object classification, and recognising textual entailment, pages 177–190. Springer.
- [Dolan et al.1993] Dolan, W., Vanderwende, L., and Richardson, S. D. (1993). Automatically deriving structured knowledge bases from on-line dictionaries. In Proceedings of the First Conference of the Pacific Association for Computational Linguistics, pages 5–14. Pacific Association for Computational Linguistics Vancouver.
- [Fellbaum1998] Fellbaum, C. (1998). WordNet. Wiley Online Library.
- [Freitas et al.2014] Freitas, A., da Silva, J. a. C. P., Curry, E., and Buitelaar, P. (2014). A distributional semantics approach for selective reasoning on commonsense graph knowledge bases. In International Conference on Applications of Natural Language to Data Bases/Information Systems, pages 21–32. Springer.
- [Granger1984] Granger, E. H. (1984). Aristotle on genus and differentia. Journal of the History of Philosophy, 22(1):1–23.
Explainable artificial intelligence (XAI).Defense Advanced Research Projects Agency (DARPA).
- [Kouylekov and Magnini2005] Kouylekov, M. and Magnini, B. (2005). Recognizing textual entailment with tree edit distance algorithms. In Proceedings of the First Challenge Workshop Recognising Textual Entailment, pages 17–20.
- [Lloyd1962] Lloyd, A. C. (1962). Genus, species and ordered series in Aristotle. Phronesis, pages 67–90.
[Manning et al.2014]
Manning, C. D., Surdeanu, M., Bauer, J., Finkel, J. R., Bethard, S., and
The Stanford Corenlp natural language processing toolkit.In ACL (System Demonstrations), pages 55–60.
- [Màrquez et al.2008] Màrquez, L., Carreras, X., Litkowski, K. C., and Stevenson, S. (2008). Semantic role labeling: an introduction to the special issue. Computational linguistics, 34(2):145–159.
- [Mesnil et al.2015] Mesnil, G., Dauphin, Y., Yao, K., Bengio, Y., Deng, L., Hakkani-Tur, D., He, X., Heck, L., Tur, G., Yu, D., et al. (2015). Using recurrent neural networks for slot filling in spoken language understanding. IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP), 23(3):530–539.
- [Recski2016] Recski, G. (2016). Building concept graphs from monolingual dictionary entries. In Nicoletta Calzolari (Conference Chair), et al., editors, Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016). European Language Resources Association (ELRA).
- [Silva et al.2016] Silva, V. S., Handschuh, S., and Freitas, A. (2016). Categorization of semantic roles for dictionary definitions. In Cognitive Aspects of the Lexicon (CogALex-V), Workshop at COLING 2016, pages 176–184.
- [Silva et al.2018] Silva, V. S., Freitas, A., and Handschuh, S. (2018). Recognizing and justifying text entailment through distributional navigation on definition graphs. In AAAI.
- [Vossen and Copestake1994] Vossen, P. and Copestake, A. (1994). Untangling definition structure into knowledge representation. In Inheritance, defaults and the lexicon, pages 246–274. Cambridge University Press.
- [Vossen1991] Vossen, P. (1991). Converting data from a lexical database to a knowledge base. Esprit BRA-3030 ACQUILEX Working Paper No 27.
- [Vossen1992] Vossen, P. (1992). The automatic construction of a knowledge base from dictionaries: a combination of techniques. In EURALEX, volume 92, pages 311–326.
- [Wang and Neumann2008] Wang, R. and Neumann, G. (2008). An divide-and-conquer strategy for recognizing textual entailment. In Proceedings of the Text Analysis Conference, Gaithersburg, MD.