The goal of relation extraction is to identify relations among entities in the text. It is an integral part of knowledge base population [Heng2011], question answering [Xu2016], and spoken user interfaces [Yoshino:2011:SDS:2132890.2132898]. Extracting relations reliably is still a challenging task [bunescu-mooney-2005-shortest, Guo2019AttentionGG, Luan2018], with most existing solutions relying on training data that contains a limited set of relations. These approaches cannot match patterns outside the ones specified in the training set.
In many useful cases the relations need to be customised to a specific ontology relevant only in a small collection of documents, making it very difficult to label enough examples. Zero-shot learning has been used to overcome this limit: for example, one can understand relation extraction as a question answering problem [Levy2017ZeroShotRE]. This approach can be quite successful, leveraging on recent reading comprehension progress: It trains a system on extracting semantic content first, then applies the learned generalization to create flexible rules for relation extraction.
While impressive, question answering is however not completely solved, with most Reading Comprehension corpora presenting only queries that can be answered using a single phrase [welbl-etal-2018-constructing]: The generalization stemming from a question answering problem limits the type of rules that can be written for relation extraction. Moreover, while using a question answering approach improves the recall of the extractor, it can also lower the precision of the matches due to mistaken reading comprension.
For limited sets of documents the relations to extract can often be pinned down to a few useful sentences. For example the relation WORKED_AT might be satisfactorily represented by using only a few variations around a PERSON worked for a COMPANY. For relations of this type the generalization needed is limited. Linguistic theories allow to generate a semantic representation that offers a useful generalisation of the sentence content, while at the same time providing a framework for precise rule matching over the represented text. By using Discourse Representation Theory [Kamp1993FromDT] or a Neo-Davisonian semantics [Parsons1990]
it is possible to describe a collection of sentences as a set of predicates. In these frameworks the relation extraction rules become a pattern matching exercise over graphs. The works ofReddy et al. [reddy-etal-2014-large, TACL807] as well as Tiktinsky et al. [tiktinsky2020pybart] are an inspiration for this paper.
Further flexibility comes from representing words using word embeddings [MikolovSCCD13]. In this paper each lemma is associated to an entry in the Glove dataset [pennington2014glove]. In addition, specialised entities are written as a list of embeddings.
Writing a discourse as a collection of predicates is isomorphic to a graph representation of the text. The main idea of this paper is to discover relations in the discourse by matching specific sub-graphs. Each pattern match is effectively a graph query where the data is the discourse. The main contribution of this work is two-fold: First, it suggests a way to semantically encode sentences. Secondly, it defines a method for creating a set of flexible rules for low-resource relation extraction.
2.1 Semantic representation
Sentences are transformed into graphs following a similar method of [TACL807]. We start with a dependency parser [spacy2] and apply a series of transformations to obtain a neo-Davidsonian representation of the sentence222The full list of transformation from dependency tree to a neo-Davisonian form can be found in the code repository.. In this form active and passive tenses are represented with the same expression, all words are lemmatized, and co-reference is added to the representation.
For example, the sentence Jane is working at ACME Inc as a woodworker. She is quite taller than the average becomes in a predicate form
In this representation the sentence is a graph (Fig. 1), where the nodes are nouns, verbs, adverbs, and adjectives, and the edges are the semantic relations among them.
An additional level of semantics is added by linking together two nouns that co-refer, using the edge.
2.2 Matching of words
Words are represented using the Glove word embeddings of their lemma and a few different tags:
Negated: A True/False value that indicates whether a word is associated to a negation: if a verb is negated the adverb does not appear as a new node, rather the verb is flagged using this tag. In this way work can never match does not work.
Named Entity Type: A label indicating the entity type of the node as per Ontonotes 5.0 notation [Hovy:2006:O9S:1614049.1614064].
Node type: Whether it is a verb, a noun, an adjective, or an adverb.
For example, the noun is represented internally as
Two words match if the dot product between their lemma’s embeddings is greater than a specific thresholds, and all the other tags coincide. For example, the words carpenter and woodworker match within the used threshold. This solution can in principle be augmented with an external ontology, where synonyms and hypernyms would trigger a match as well.
In addition, the system allows to cluster a set of words under the same definition.
All words within the threshold distance would trigger a match. For example the word tome would match the word book, thus falling into the LITERATURE category.
2.3 Matching of sentences
As described before, a text becomes a set of graphs (the discourse graph). Rules have two components: a MATCH clause, which defines the trigger for the rule, and a CREATE clause, which creates the relation edge. Relations must connect two entities. I have chosen the symbol # to mark the items that need a relation among them. For example the sentence Jane#1 works at Acme#2 tags Jane and Acme for an edge to connect them.
The matching sentence can contain Named Entities (PERSON, ORG, DATE, etc) as well as an internally-defined entity (Sec. 2.2).
An example of two rules is as in the following
Please note that a MATCH clause is written as a sentence but it is internally parsed into a graph. A rule is triggered if this semantic representation is a sub-graph of the discourse graph. Two nodes are considered equal if they match according to the method in Sec. 2.2. The rules are represented as simple pattern matching rules as in Fig 1.
Notice also that for the second rule in the above example more than one sentence is specified. This is because the MATCH clause can be a text as complex and free-flowing as the documents that are being parsed. The trigger sentences also solve co-reference: In the second rule, the person that works is the same person that is tall.
This is an advantage of using an internal semantic representation: One could add more complex mangling of the sentences where simple logical constraints are added (and/or), or information is extracted from mathematical formulas.
Each rule behaves according the method defined above. When a graph triggers a rule, an edge is created in the relations graph, as show in Fig 1. In this final representation, all of the discourse structures have disappeared and knowledge is condensed onto the pre-defined relations.
3 Limitations and future work
I have presented a flexible rule-based relation extractor for limited resource sets. Flexible rules can be created, thus allowing for a quick relation extractor using specialized ontologies. The main advantage of this approach is control over the rules and precision in the extracted content. An extension of the system should allow customized ontologies to be used for word matching. Moreover more Named Entities should be included, possibly allowing for specialized NER systems within the internal pipeline.
As a final limitation, the system does not assign a temporal dimension to events yet. This information should be extracted from verb tenses and added to the discourse graph.