Event extraction [DBLP:conf/kdd/RitterMEC12] is a process to extract the named entities [DBLP:conf/naacl/LampleBSKD16], event triggers [DBLP:conf/acl/BronsteinDLJF15] and their relationships from real-world corpora. The named entities refer to those texts about predefined classes (e.g. person names, company name and locations) and event triggers are words that express the types of events in texts [DBLP:conf/acl/BronsteinDLJF15] (e.g., the word “hire” may trigger an “employ” event type). In literature, named entities and triggers are connected and named entities with corresponding roles are called arguments for a given trigger [DBLP:conf/naacl/ChenJ09] of a specific event.
Currently, most existing works divide the event extraction into two independent sub-tasks: named entity recognition[DBLP:conf/naacl/LampleBSKD16] and trigger labeling [DBLP:conf/acl/BronsteinDLJF15]. These two sub-tasks are always formulated as multi-class classification problems, and many works apply the sequence-to-sequence based labeling method which aims to translate a sentence into sequential tags [DBLP:journals/tacl/ChiuN16]. From our investigation, one problem of these sequence-to-sequence methods is that they ignore the orders of output tags, and therefore, it is difficult to precisely annotate different parts of an entity. To address this issue, some methods [MaH16, DBLP:conf/www/AlzaidyCG19] propose to incorporate the conditional random field (CRF) module to be aware of order-constraints for the annotated tags.
Since entities and triggers are naturally connected around events, recent works try to extract them jointly from corpora. Early methods apply pipeline frameworks with predefined lexical features [DBLP:conf/acl/LiJH13] which lack generality to different applications. Recent works leverage the structural dependency between entities and triggers [DBLP:conf/naacl/YangM16, DBLP:conf/ijcai/ZhangQZLJ19] to further improve the performances of both the entity and trigger identification sub-tasks.
Although existing works have achieved comparable performance on jointly extracting entities and triggers, these approaches still suffer the major limitation of losing co-occurrence relationships between entities and triggers. Many existing methods determine the trigger and entities separately and then match the entities with triggers [DBLP:conf/naacl/YangM16, DBLP:conf/aaai/NguyenN19]. In this way, the co-occurrence relationships between entities and triggers are ignored, although pre-trained features or prior data are introduced to achieve better performance. It is also challenging to capture effective co-occurrence relationships between the entities and their triggers. We observed from the experiments that most of the entities and triggers are co-occurred sparsely (or indirectly) throughout a corpus. This issue exacerbates the problem of losing co-occurrence relationships mentioned before.
To address the aforementioned challenge, the core insight of this paper is that in the joint-event-extraction task, the ground-truth annotations for triggers could be leveraged to supervise the extraction of the entities, and vice versa. Based on this insight, this paper proposes a novel method to extract structural information from corpora by utilizing the co-occurrence relationships between triggers and entities. Furthermore, in order to fully address the aforementioned sparsely co-occurrence relationships, we model the entity-trigger co-occurrence pairs as a heterogeneous information network (HIN) and supervise the trigger extraction by inferring the entity distribution with given triggers based on the indirect co-occurrence relationships collected along the meta-paths from a heterogeneous information network (HIN).
Figure 1 illustrates the process of our proposed method to collect indirect co-occurrence relationships between entities and triggers. Figure 0(a) is a sub-graph of the “entity-trigger” HIN for the ACE 2005 corpus [ldc]. Figure 0(c) compares the entity distributions inferred from given triggers based on the direct adjacency matrix and that inferred from the meta-path adjacency matrix. From this figure, we observe that a trigger does not necessarily connect to all entities directly and the direct-adjacency-based distribution is more concentrated on a few entities, while the meta-path-based distribution is spread over a larger number of entities. This shows that a model could collect indirect co-occurrence patterns between entities and triggers based on the meta-path adjacency matrix of an “entity-trigger” HIN. Moreover, the obtained indirect patterns could be applied to improve the performance to extract both entities and triggers.
Based on the aforementioned example and analysis, we propose a neural network to extract event entities and triggers. Our model is built on the top of sequence-to-sequence labeling framework and its inner parameters are supervised by both the ground-truth annotations of sentences and “entity-trigger” co-occurrence relationships. Furthermore, to fully address the indirect “entity-trigger” co-occurrence relationships, we propose the Cross-Supervised M
echanism (CSM) based on the HIN. The CSM alternatively supervises the entity and trigger extraction with the indirect co-occurrence patterns mined from a corpus. CSM builds a bridge for triggers or entities by collecting their latent co-occurrence patterns along meta-paths of the corresponding heterogeneous information network for a corpus. Then the obtained patterns are applied to boost the performances of entity and triggers extractions alternatively. We define this process as a “cross-supervise” mechanism. The experimental results show that our method achieves higher precisions and recalls than several state-of-the-art methods.
In summary, the main contributions of this paper are as follows:
We formalize the joint-event-extraction task as a sequence-to-sequence labeling with a combined tag-set, and then design a novel model, CSM, by considering the indirect “entity-trigger” co-occurrence relationships to improve the performance of joint-event-extraction.
We are the first to use the indirect “entity-trigger” co-occurrence relationships (encoded in HIN) to improve the performance of the joint-event-extraction task. With the co-occurrence relationships collected based on meta-path technology, our model can be more precise than the current methods without any predefined features.
Our experiments on real-world datasets show that, with the proposed cross-supervised mechanism, our method achieves better performance on the joint-event-extraction task than other related alternatives.
The remainder of this paper is organized as follows. In Section II, we first introduce some preliminary knowledge about event extraction and HIN, and also formulate the problem. Section III presents our proposed model in detail. Section IV verifies the effectiveness of our model and compares it with state-of-the-art methods on real-world datasets. Finally, we conclude this paper in Section V.
We formalize the related notations about the joint-event-extraction and heterogeneous information network.
Ii-a The Joint-Event-Extraction Task
The sequence-to-sequence is a popular framework for event extraction [DBLP:journals/tacl/ChiuN16], which has been widely adopted in many recent related works. These methods annotate each token of a sentence as one tag in a pre-defined tag-set
. In this way, a model based on sequence-to-sequence framework learns the relationship between original sentences and annotated tag-sequences. Recurrent Neural Networks (RNN)[DBLP:conf/nips/SutskeverVL14] have shown promising performance in dealing with sequence-to-sequence learning problems. Therefore, lots of recent works [MaH16, NguyenCG16] apply RNN to perform the sequence-to-sequence event extraction.
Combined Annotation Tag-Set. In order to extract the entities and trigger words jointly under the sequence-to-sequence framework, one way is to extend the original tag-set to a combined tag-set of entity types and trigger types, i.e. , where and represent the set of entity types and trigger types, respectively.
Given a sentence , where s are tokens (), the joint-event-extraction is defined as the process to annotate each () as one of the tags in set . This results in an annotated sequence , where . Then the joint event extraction becomes a sequence-to-sequence labeling [MaH16] which transforms a token sequence into a tag sequence.
Sequence-to-Sequence Labeling.DBLP:journals/anor/BoerKMR05] has always been introduced to achieve this goal. The cross-entropy loss function is defined as follows:
is the probability for a model to annotate a tokenas a tag and is the probability of an oracle model to annotate the token as the tag (). Within the framework of sequence-to-sequence labeling, entities and triggers could be recognized simultaneously by mapping the token sequence (of a sentence) to a combined tag sequence.
Generally, an event is modeled as a structure consisting of elements, such as event triggers and entities in different roles [NguyenCG16]. As shown in Figure 1, event factors [DBLP:conf/acl/ChenLZLZ17] from sentences accumulate to a heterogeneous information network [DBLP:journals/tkde/ShiLZSY17] with nodes in different types. Furthermore, we observe that all edges or direct connections in Figure 1 are between triggers and entities, implying that named entities and triggers are contexts for each other. Intuitively, the performance of a joint-event-extraction task may degrade if it annotates triggers without the supervision of entities or annotates entities without the supervision of triggers.
Ii-B “Entity-Trigger” Heterogeneous Information Network
Given a corpus , an “entity-trigger” heterogeneous information network (HIN) is a weighted graph , where is a node set of entities and triggers; is an edge set, for ( , ), denotes that and are co-occurred in a sentence of ; is a set of weight, for , (), refers to the frequency that and are co-occurred in sentences of . Furthermore, contains a node type mapping function and a link type mapping function , where is the combined annotation tag-set and denotes the set of predefined ink types.
In particular, an “entity-trigger” HIN can be obtained by treating co-occurrence relationships between entities and triggers as edges. As illustrated in Figure 1, “entity-trigger” HINs are usually sparse since entities do not directly connect (or co-occur) to all triggers and vice versa. In order to collect this indirect information, we resort to the meta-path [DBLP:journals/tkde/ShiLZSY17] based on “entity-trigger” HIN.
Meta-Path [DBLP:journals/tkde/ShiLZSY17]. A meta-path is a sequence , where is the length of this path and (). Generally, could be abbreviated as .
As shown in Figure 0(a), given two basic paths “U.S. troops-go-Iraq”, “most people-go-the country” in the ACE 2005 corpus [ldc], the corresponding meta-path is “PER-Movement-GPE” for both basic paths, where “Movement” is a trigger type, “PER” and “GPE” are entity types. This observation shows that the entities in types “PER” and “GPE” are indirectly connected through the given meta-path in the ACE 2005.
Since the routes for meta-paths are node types, they are much more general than direct paths. Furthermore, the meta-paths encode the indirect co-occurrence relationships between triggers and entities. Therefore, we can collect the latent information in the “entity-trigger” HIN along meta-paths to alleviate the sparse co-occurrence issue between entities and triggers.
Ii-C Problem Formulation
In this section, we formalize the problem of joint-event-extraction by utilizing the co-occurrence relationships between entities and triggers (abbreviated as co-occurrence relationships in the following part) in a HIN.
Joint-Event-Extraction via HIN. Given a corpus , its “entity-trigger” HIN and a set of meta-paths . The task of joint-event-extraction via HIN is to map the token sequences (of sentences) in to sequences of tags (for any tag ) with the co-occurrence patterns in based on the meta-paths in .
Intuitively, the corresponding “entity-trigger” HIN of a given corpus is naturally aligned together to form a knowledge graph that conforms to a corpus and can be used to supervise both the extracting processes for named entities and event triggers. In other words, if an annotation (both for entities and triggers) from a corpus violates its corresponding “entity-trigger” HIN, the entities and triggers in this result must be ill-annotated.
Iii Our Proposed Model
As shown in Figure 2, we define our task as a two-step process. First, it performs sequence-to-sequence labeling to annotate all entities and triggers, as shown on the left hand side of Figure 2. Then, it supervises the annotated results by inferring the probabilities of the predicted entity and trigger based on the annotated results and indirect co-occurrence relationships, as shown on the right hand side of Figure 0(a). To predict the entities or triggers distributions, we propose the meta-path based adjacency matrix for a given HIN and apply it to alternatively derive the entity and trigger distributions from each other. We name our method as the Cross-Supervised Mechanism (CSM) and implement it by a well designed neural cross-supervised layer (NCSL). Moreover, since the NCSL can be linked with any differentiable loss function, it can also be easily extended to many other event-extraction models. In this section, we will elaborate each part of our proposed model.
Iii-a Cross-Supervised Mechanism
To incorporate the co-occurrence relationship into the joint-event-extraction process, we propose the cross-supervised mechanism. It is based on the observation that triggers and entities are prevalently connected in an “entity-trigger” HIN (cf. Figure 1
). With this observation, in a given corpus, the trigger of an event indicates the related entities. Meanwhile, the entities of an event also contain evidence for the corresponding trigger. Therefore, an extracted result could be evaluated by comparing the predicted entities (or triggers) based on the extracted triggers (or entities) with ground-truth entities (triggers). In order to implement this idea, we first define the probability distributions for entities and triggers.
Entity and Trigger Distribution. The entity distribution is a probability function for any entity type , while the trigger distribution is a probability function for any trigger type . With these notations of entity and trigger distributions, the cross-supervised mechanism could be defined as follows.
Cross-Supervised Mechanism. Given an entity distribution , a trigger distribution for a corpus and the corresponding HIN ; Suppose and are entity and trigger distributions based on the extraction results of a model. Then the target of cross-supervised mechanism is to minimize the following loss function:
where and are the functions to predict entity and trigger distributions with the extracted results based on ; is a function to compute the difference between two distributions. Intuitively, measures the loss between the predicted and ground-truth distributions for entities and triggers.
To alternatively predict the entities (or triggers) based on the given triggers (or entities) from a HIN, the adjacency matrix of “entity-trigger” HIN is a natural tool to convert one (e.g. entity or trigger) distribution to another.
Entity-Trigger Direct Adjacency Matrix. The entity-trigger direct adjacency matrix is an matrix , where refers to the frequency that an entity and a trigger are co-occurred in sentences of a corpus.
With the notation of the entity-trigger direct adjacency matrix, the alternative predicting function and can be computed as the following equations:
where and are vectors; and are vectors; and for ; and for . However, since the “entity-trigger” HIN may be sparse (cf. Figure 0(c)), it is challenging to precisely predict entity and trigger distributions with inadequate evidence. Thus, we resort to the meta-path based technology to utilize the sparse information in a HIN.
Meta-Path based Adjacency Matrix. In the same setting of the direct adjacency matrix, given a set of meta-paths , the meta-path based adjacency matrix is an matrix , where is denoted as:
where is the reachable probability from to based on a given meta-path . Suppose , is computed in the following equation:
where is the type of node , is the -th type in path (); is the frequency that and are co-occurred in sentences; is the reachable probability from node to by considering the types and . can be obtained through a meta-path based random walk [DBLP:journals/tkde/ShiHZY19].
where is the set of direct neighbors for node by considering the next type on path .
where and compute the entity and trigger meta-path based distributions, respectively.
Iii-B Neural Cross-Supervised Layer
With the aforementioned discussion, we could further evaluate the possibility of the trigger distribution based on the annotated entities of a model or evaluate the possibility that the entity distribution of the entity distribution based on the annotated triggers of the same model. We name this evaluation process as the cross-supervision and implement it in the NCSL. By substituting the Eq. 8 and Eq. 9 for corresponding terms in Eq. 2, NCSL evaluates this difference with two concatenate KL-divergence loss [DBLP:conf/iccv/GoldbergerGG03] in the following:
where and are the predicted distributions for entities and triggers by the sequence-to-sequence labeling; and are the ground-truth entity and trigger distributions, respectively. In this way, NCSL incorporates both the cross-supervised information for triggers and entities into its process.
Iii-C Training the Complete Model
We formalize the complete process of our model as follows.
Cross-Supervised Joint-event-extraction. The objective of our task is to optimize the following equation:
As illustrated in Figure 2
, this model implements the sequence-to-sequence labeling with an embedding layer which embeds the input sentences as sequences of vectors and a Bidirectional Long-Short-Term Memory (Bi-LSTM) network[DBLP:conf/aclnut/LimsopathamC16] of RNN [DBLP:conf/nips/SutskeverVL14] family to predict the tag distribution based on the embedded vector sequences. The training applies the back-propagation with the Adam optimizer [DBLP:journals/corr/KingmaB14] to optimize this loss function.
From Eq. 11, we observe that our task is equivalent to the sequence-to-sequence method when . Therefore, our model could be easily implemented by following an end-to-end framework with extra supervision information incorporated in the co-occurrence relationships. Here we also summarize the novelty of our proposed approach as the introduced cross-supervised mechanism by incorporating indirect co-occurrence relationships collected from the “entity-trigger” HIN along meta-paths (cf. in Eq. 11), for the task of joint-event-extraction. The introduced cross-supervised mechanism aims to maximizing the utilization efficiency of the training data, so that more effective information will be considered to improve the performance of joint-event-extraction.
Iv Experiment and Analysis
We compare our model with some state-of-the-art methods to verify the effectiveness of the proposed mechanism.
|Model||Entity extraction||Trigger extraction|
We adopt four real-world datasets which are widely used to evaluate our model. ACE 2005 is a corpus developed by Linguistic Data Consortium (LDC) [ldc]. NYT is an annotated corpus provided by the New York Times Newsroom [nyt]. CoNLL 2002 [conll] is a Spanish corpus made available by the Spanish EFE News Agency. WebNLG is a corpus introduced by Claire et al. [DBLP:conf/acl/GardentSNP17]
in the challenge of natural language generation, which also consists the entity label. Note that all aforementioned datasets except ACE 2005 do not provide the original ground-truth trigger annotations. In the testing phase, since it requires ground-truth trigger annotations to measure the performances of models, we instead use CoreNLP111https://stanfordnlp.github.io/CoreNLP/ to create the corresponding trigger annotations for these datasets. More details of our datasets are shown in Table I.
Iv-B Comparison Baselines
We compare our method with some state-of-the-art baselines for event extraction.
Sequence-to-Sequence Joint Extraction (Seq2Seq) [DBLP:conf/aclnut/LimsopathamC16] [DBLP:conf/acl/ZhengWBHZX17] is a joint extraction method implemented by us in the sequence-to-sequence framework with a joint tag set contains tags for both entities and triggers.
Conditional Random Field Joint Extraction (CRF) [DBLP:conf/www/AlzaidyCG19] extends from the basic sequence-to-sequence framework with a conditional random field (CRF) layer which constraints the output tag orders.
GCN [DBLP:conf/acl/FuLM19] jointly extracts entities and triggers by considering the context information with graph convolution network (GCN) layers behind the BiLSTM module.
Joint Event Extraction (JEE) [DBLP:conf/naacl/YangM16] is a joint statistical method based on the structural dependencies between entities and triggers.
Joint Transition (JT) [DBLP:conf/ijcai/ZhangQZLJ19] models the parsing process for a sentence as a transition system, and proposes a neural transition framework to predict the future transition with the given tokens and learned transition system.
Iv-C Evaluation Metrics
To evaluate the performance of our proposed model, we adopt several prevalent metrics, e.g., precision, recall and F1 score, which have been widely used in the field of event extraction. The Precision and Recall are defined as follows:
where is the true positive frequency, is the false positive frequency and is the false negative frequency. The quantities , , and are measured from the predicted tags of a model by referring to the ground-truth tags for the testing samples. In our setting, for a specific model, records the number of predicted tags matching with the corresponding ground-truth tags for entities and triggers. , on the other hand, records the frequency of its predicted tags conflicting with the corresponding ground-truth tags, and records the number of entities and triggers missed by a model.
F1 measures the joint performance for a model by considering the precision and recall simultaneously.
Iv-D Implementation Details
Since our aim is to incorporate the indirect co-occurrence relationships between the entities and their triggers into the joint-event-extraction task, not to investigate the influence of pre-trained features on different models, we implement all models in IV-B without any pre-trained features on our prototype system. Furthermore, in order to compare all methods fairly, all the neural network models share the same LSTM module (a Bi-LSTM with 128 hidden dimensions and 2 hidden layers) as the basic semantic embedding. Moreover, all neural network models are trained through the Adam optimizer [DBLP:journals/corr/KingmaB14]
with the same learning rate (0.02) and 30 training epoches. During the training, we set the embedding dimension of a word to 300, the batch size to 256, and the dropout to 0.5.
HIN Generation. Our model requires HINs to convert between the entity and trigger distributions. We need to generate the required HINs in a preprocessing step. The HINs are generated by merging all ground-truth triggers and entities with their relationships and types from the training data. For each training process, the HIN is re-generated with different training data. During the testing process, the entity distribution is translated into the trigger distribution according to the corresponding HIN, without knowing any co-occurrence relationships between the entities and triggers in testing data. Moreover, our HINs are generated based on the basic event types since the obtained HINs based on event subtypes are too sparse to reveal effective indirect co-occurrence relationships.
In the following experiments, we compare the precision, recall and F1 scores for all methods in 10-fold cross-validation. The 10-fold cross-validation means we split the original data into 10 subsets randomly without intersection and train the models with 9 of these subsets. We test the models with the remaining subset. This procedure is repeated 10 times. We report the means and variances of the results in the remaining part. Furthermore, to compare the models on recognizing the effect event factors, we exclude the results for those tokens being labelled as the outside tag (or “”) for all methods.
Iv-E Experimental Results
The results of the comparison experiment on all datasets are reported in Table II. We observe that with the cross-supervised mechanism provided by the NCSL layer, both CSM and CSM surpass all the state-of-the-art methods. Furthermore, we also measure the mean performances on entity and trigger extraction respectively using the ACE 2005 dataset for all methods. This result is reported in Table III. We observe that our model outperforms the alternative models on both joint task and sub-tasks. This verifies that the extraction performance is indeed improved by the indirect co-occurrence relationships collected along the meta-paths of heterogeneous information networks.
Iv-F Sensitivity Analysis
We analyze the influence of the training ratio (from 5 to 10 fold cross-validation) and the length of meta-paths on the performance of our model. These experiments are performed on the ACE 2005 dataset and all of them are repeated 10 times. The mean results are reported in Figure 3. As shown in Figure 2(a), our model achieves the best performance with the meta-path length of 3. The reason is that most of the ACE 2005 data are in the “entity-trigger-entity” form, our model performs well with the meta-path lengths which are multipliers of 3. Furthermore, from Figure 2(b), we can see our model also performs well when the is large, which confirms to the intuition that more training data lead to better performance.
Iv-G Case Study
To figure out the improvement of our model on the extraction task, we focus on typical cases from the ACE 2005 dataset. These cases are presented in Figure 4, where “Oracle” means the ground-truth annotation. We observe that in simple sentences, both the sequence-to-sequence method and our model annotate accurately. However, with the sentence becoming more complex (cf. the bottom sentence in Figure 3), the sequence-to-sequence method hardly annotates accurate entities that are far from the trigger, while our method keeps stable performance. This further shows that our method can extract the useful latent patterns along the meta-paths.
In this paper, we have proposed a novel cross-supervised mechanism which allows models to extract entities and triggers jointly. Our mechanism alternately supervises the extraction process for either the triggers or the entities, based on the information in the type distribution of each other. In this way, we incorporate the co-occurrence relationships between entities and triggers into the joint-event-extraction process of our model. Moreover, to further address the problem caused by the sparse co-occurrence relationships, our method also resorts to the heterogeneous information network technology to collect indirect co-occurrence relationships. The empirical results show that our method improves the extraction performances for entities and triggers simultaneously. This verifies that the incorporated co-occurrence relationships are useful for the joint-event-extraction task and our method is more effective than existing methods in utilizing training samples. Our future works include: (a) investigating the impact of length of sampled meta-paths, as in this paper we have limited the meta-path into a fixed length; (b) connecting the extracted entities and triggers from a corpus to facilitate the automatic knowledge graph construction.
This work is supported by the National Natural Science Foundation of China (Grant No. 61976235, 61602535, 61503422), Program for Innovation Research in Central University of Finance and Economics, and the Foundation of State Key Laboratory of Cognitive Intelligence (Grant No. COGOSC-20190002), iFLYTEK, P. R. China. This work is also supported in part by NSF under grants III-1763325, III-1909323, and SaTC-1930941.