In the era of information explosion, there is an urgent need for technology to capture the critical information of aiming events from numerous texts (DBLP:journals/access/XiangW19; ijcai2020-693; ma2021comprehensive; su2021comprehensive
). Event extraction technology can help us locate the text of a specific event type and find the essential arguments of the event from the text. Therefore, many scholars have studied neural network-based event extraction technology (DBLP:conf/emnlp/WangZH19; DBLP:conf/aaai/NguyenN19 DBLP:conf/aaai/Zhou0ZWXL21).
Event extraction (EE) is a task that acquires semantic and structural knowledge (i.e., events) from texts. Each event is represented by a typed phrase containing a trigger word, which mainly indicates the occurrence of the event, and some other arguments to ensure semantic completeness. A typical EE task usually involves two main sub-tasks: type classification is to distinguish the specified event types of sentences; and element extraction is to extract the elements including triggers and arguments under different role patterns (i.e., schemas). As shown in Figure 1, an example sentence has two types of events for Die and Attack. For the Die event, “died" is the Trigger, and arguments of “Baghdad", “cameraman" and “American tank" take on the roles of Place, Victim and Instrument respectively. For the Attack event, “fired" is the Trigger, and arguments of “Baghdad", “American tank" take on the roles of Place, and Instrument respectively, and argument of “cameraman" and “Palestine Hotel" take on the roles of Target. From the example, we can see that for different event types, the corresponding elements need to be extracted with their own extraction schemas.
Previous work (DBLP:conf/naacl/FergusonLWH18; DBLP:conf/emnlp/WangZH19; DBLP:conf/aaai/NguyenN19) on neural network-based EE usually follows an isolated learning paradigm, performing extraction independently for different event types. This may lead to low performance of event extraction models for those less frequent types that do not possess enough contextual information. We argue, however, that it will be beneficial for EE to associate closely related types in a collective way. For the event type classification task, different event types in various sentences can be highly related. For example, a document may have both the event type Execute and Attack, which describe the same topic despite belonging to different sentences. In real event extraction datasets such as ACE 2005 (DBLP:conf/lrec/DoddingtonMPRSW04), 1236 sentences belong to the type Attack, but only 19 sentences belong to the type Execute. Existing methods (DBLP:conf/acl/HuangCFJVHS16; DBLP:conf/emnlp/SubburathinamLJ19
) classify the event types separately for each sentence in a document, failing to handle these connections among the types explicitly. Actually, connecting these two sentences to learn associations among the two types is helpful, which can provide more complementary contextual information for the less frequent typeExecute.
On the other hand, argument roles in different event types can also be correlated. In ACE 2005, some arguments with the same meaning will be defined as different element roles, such as role Time-At-Beginning in type Attack and role Time-Before in type TransferOwnership. But traditional schema-based event extraction models (DBLP:conf/acl/ChenXLZ015; DBLP:conf/emnlp/WaddenWLH19) always treat these semantically related roles as different ones and extract each element independently with its own role pattern. This also neglects relevant information available in different roles.
To deal with the above issues, we propose a novel neural network-based event extraction framework that learns associations among event types and relevant information in argument roles, referred to associated EE (AEE). Given a document, we need to perform EE for all sentences. Firstly, the type classification task is conducted in a graph setting. We design a document-level graph with both sentences and words represented as nodes. Four kinds of relationships among sentences and words are represented as edges. Then, we employ a graph attention network to learn sentence embeddings and perform classification tasks on these learned embeddings. In this way, AEE couples the type classification tasks of multiple sentences jointly in a unified graph and enables implicit context sharing among sentences across different types. Then, for the element extraction task, we design a new schema on ACE 2005 for constructing associations among argument roles. The schema couples highly related argument roles into the same one, and afterward preserves all roles in one universal pattern for all event types. As such, different roles can also share knowledge jointly in a pattern. A parameter inheritance method is further introduced to draw on the type knowledge from event classification, to help the extraction model focus on learning type-preferred roles in the universal schema. The event type is needed as the input of the argument extraction model leads to introducing parameter inheritance mechanism difficultly in the original schema. By modeling correlations among types and roles together, our approach AEE may further enrich contextual information across different types/roles and alleviate the problem of low accuracy in EE caused by uneven data distribution. To be specific, we contribute to
We propose a novel neural association event extraction framework of learning shared information among event types for the event classification task and argument roles for the element extraction.
As to event types, we design a document-level graph and use a graph attention network to learn connections among sentence nodes. As to argument roles, a new universal schema is built to unify the role patterns of all event types, realizing knowledge sharing among roles. A parameter inheritance mechanism is further developed to facilitate type-preferred role identification for different event types.
Experimental results show that our approach consistently outperforms most state-of-the-art event extraction methods. Particularly, our method can get more steady performance than the existing methods, especially on types/roles with fewer data.
2 Related Work
The purpose of event extraction is to capture the event types that we are interested in from many texts and show the essential arguments of events in a structured form 10.1145/3442381.3449834. The event extraction task is divided into open-domain based event extraction (DBLP:journals/corr/abs-1912-11334; DBLP:conf/acl/LiuHZ19; DBLP:conf/emnlp/WangZH19) and schema-based event extraction (DBLP:conf/acl/HuangCFJVHS16; DBLP:conf/naacl/YangM16; DBLP:conf/naacl/FergusonLWH18; DBLP:conf/aaai/AhmadPC21) according to whether to construct event schema for extracting triggers and arguments.
Most of the existing open domain event extraction studies event extraction in social networks (DBLP:conf/kdd/RitterMEC12, DBLP:conf/esws/KatsiosVKP15, DBLP:journals/nle/KunnemanB16). There are many kinds of events in social network, and new event types will be derived according to the change of time. Although the method based on social network extraction is not suitable for social network events extraction. Therefore, many scholars study open domain event extraction (DBLP:conf/acl/LiuHZ19; DBLP:journals/corr/abs-2009-10047). The open domain event extraction method obtains topics by clustering or classification and then extracts event arguments from multiple texts related to the topic (DBLP:journals/corr/abs-1912-11334). At present, the mainstream method is to first cluster the text, and then extract keywords from each type of text as event arguments (DBLP:journals/corr/abs-1912-11334). The open domain event extraction method does not need lots of data annotation work, and can not specify a specific event category (DBLP:conf/lpkm/MejriA17
). However, neural network-based event extraction in open domain mainly extracts keywords from events, which does not always extract the core arguments of the event. Therefore, how to build a neural network-based event extraction framework can not only share the information between classes, but also avoid the spread of error information. In addition, there is a standard feature extraction template. It is a difficult problem to be solved in event extraction task.
Schema-based event extraction requires classifying the event types and extracting the specified triggers and arguments under predefined schemas or role patterns (DBLP:conf/emnlp/WangWHLLLSZR19; DBLP:conf/acl/YangFQKL19). ACE 2005 (DBLP:conf/lrec/DoddingtonMPRSW04) is the most popular data set for schema-based event extraction tasks. Methods in this task can be divided into two categories. The pipeline-based event extraction methods first identify triggers and classify event types (DBLP:conf/acl/ChenXLZ015; DBLP:conf/emnlp/SubburathinamLJ19). Then they extract arguments according to the predicted event type and triggers (DBLP:conf/emnlp/WaddenWLH19). In order to overcome the error propagation caused by former sub-tasks, joint-based event extraction methods are proposed (DBLP:conf/ijcai/ZhangQZLJ19; DBLP:conf/naacl/LiHJ019; DBLP:conf/acl/LinJHW20). It reduces the error propagation by combining trigger identification and argument extraction tasks. JMEE (DBLP:conf/emnlp/LiuLH18) and DBRNN (DBLP:conf/aaai/ShaQCS18) have been proved to be influential in introducing graph information into event extraction tasks. However, all of these methods depend on the result of event classification in argument extraction. It may limit the shared information across event types, which is disadvantageous to neural network-based event extraction with a few labeled data.
In order to overcome the propagation of error information caused by event detection results, researchers propose a neural network-based event extraction method based on joint (DBLP:conf/ijcai/ZhangQZLJ19, DBLP:conf/naacl/LiHJ019, DBLP:conf/naacl/WenLLPLLZLWZYDW21). This method reduces the propagation of error information by combining trigger recognition and argument extraction tasks. However, neither pipeline-based event extraction nor joint-based event extraction can avoid the impact of event type prediction errors on the performance of argument extraction. Moreover, these methods can not share the information of different event types and learn each type of event independently, which is disadvantageous to the neural network-based event extraction with only a small amount of labeled data.
3 Event Extraction Framework
Our universal event schema can not only make use of the shared information between categories, but also be suitable for the event extraction task with only a small amount of labeled data. Furthermore, unified template can alleviate data imbalance by the combination the multiple roles. We design an event extraction framework AEE for learning associations among event types and arguments roles. The whole framework is divided into two parts: event type classification and element extraction. Both event detection and element extraction rely on feature representations learned by BERT to learn sentence semantic information. For event type classification, we design a document-level graph constructing connections between sentences of different types and words for each document. A document-level graph attention network (DGAT) is designed for learning sentence node representations to obtain context information for the current sentences. We design a new role schema on ACE for constructing associations between argument roles. All event types use the new schema, unify the arguments of different event types, and realize knowledge sharing among different event types. It can reduce the transmission of error information, because extracting event argument does not need to use different schemes according to different event categories. The arguments in the universal schema are mapped to the original event schema according to the different event types. Our universal schema can make use of the shared information between arguments and alleviate data imbalance by the combination of multiple roles. However, it may mix all roles in different types during element extraction. Thus, we further design a parameter inheritance method to associate with the features learned from event type classification.
3.1 Event Type Classification
Event type classification classifies the event type of the current sentence. The sentence is first passed into a BERT model, where is the length of
, which first captures the context representation of each word. Then, the word vector matrix output by BERT gets more global information through the document-level graph attention network (DGAT). Finally, the event type is classified through the sentence node of each sentence.
We construct a graph for each document, and each word is treated as a node. We leverage lexical knowledge to concatenate nodes and introduce a sentence node to connect all nodes in a sentence DBLP:conf/emnlp/GuiZZPFWH19. There are four kinds of connection edges between nodes 111We utilize the gray node to represent the sentence node, and each of the other colors corresponds to a sentence in a document., as shown in Figure 3
. The first one is the word-in-word connection. The words in a lexicon are connected one by one until they are connected to the last word. The second connection is to create an edge between the lexicon. The concrete connection mode is that the first word of the former lexicon is connected with the first word of the latter lexicon. We also introduce a sentence node that is regarded as the representation of the sentence. It is connected to all the nodes, enabling the sentence node to learn information about each sentence. It gathers information of all edges and nodes and eliminates boundary ambiguity between words. The last connection is connecting sentence nodes one by one to capture the relationship among event types across sentences.
The DGAT model learns the structural representation of each node in the document. The graph message passing by DGAT is used to obtain the optimal decision. After the DGAT, the representation of sentence node of is
. The probability of a label sequencecan be defined as follows:
where is the set of all arbitrary label sequences. , and are the weight and bias parameters specific to the labels and , is the number of event types,
is the input length of text. We minimize the loss through the cross-entropy loss function:
where and are the -th real label and the predicted label.
3.2 Element Extraction
For a given event type, element extraction is to extract the trigger and arguments related to the event type and classify the roles these arguments play. An argument may play multiple roles, and a word can belong to different arguments. We add multiple sets of binary classifiers to BERT, each set of classifiers serving a role to determine the range of all arguments belonging to it. The probability that the word is predicted to be the start position of the role is:
The probability of being end position is:
where and are learnable parameters to detect the start/end position of the argument role , is the number of element roles. is the length of input text on element extraction model, which is the same as the event type classification model to enable the element extraction model to use the parameter inheritance module in Section 3.2.2. is the representation after BERT and Bi-LSTM.
3.2.1 Universal Schema Construction
To reduce the impact of error propagation among event types, event triggers and arguments, we consider the generalization of event argument roles. On ACE 2005, there are some different event argument roles with similar properties in different event types, such as argument roles Time-At-Beginning and Time-Before. In the original event schema, the number of arguments corresponding to different event types is different. We construct a universal schema to cover the event arguments under all event types. It is mainly divided according to the grammatical relationship between elements and events. The schema we designed has 14 event argument roles, as shown in Table 1. We show the corresponding relationship between the original schema with 35 event argument roles and our universal schema.
|Our Schema||ACE 2005 Schema|
|1||Time-Begin||Time-At-Beginning, Time-Before, Time-Starting|
|4||Time-End||Time-After, Time-At-End, Time-Ending|
|5||Objective||Adjudicator, Recipient, Seller|
|6||Initiator||Agent, Attacker,Buyer, Giver, Plaintiff, Prosecutor|
|7||Receiver||Defendant, Victim, Beneficiary, Target|
|8||Entity||Entity, Instrument, Vehicle|
|9||Figure||Person, Org, Artifact|
Different from the traditional schema-based event extraction method, we construct a universal schema for all event types. Firstly, the original argument roles are converted to our universal schema. We then use our event extraction model to predict the event types and arguments. Finally, we revert the original event argument role according to the universal schema designed. To be specific, the universal schema-based event extraction method compared with existing methods adds the following
1. Element Roles Convert. We convert the argument roles in the original schema with the new argument roles according to the universal schema. The corresponding content and position of the argument roles remain unchanged.
2. Element Roles Revert. After the event extraction is fully complete, we revert the predicted event argument roles to the original event argument roles according to the predicted event type.
3.2.2 Parameters Inheritance
In order to introduce event type classification knowledge for event element extraction, we use parameter inheritance to initialize element extraction model parameters. There are two sub-tasks in the event extraction model, namely event type classification and event element extraction. The BERT encoding layer of the event type classification model encodes the input sentences. The parameters are denoted as , and the vector of each token in the sentence is denoted as after the training stage. When switching to the element extraction model, the token of the same sentence is the input without event type. It is the main difference from existing schema-based methods to avoid error transmission of the event type classification model.
After passing the BERT coding layer, the parameters are fixed and denoted by random initialization as the parameters , the Bi-LSTM layer of the event element extraction model receives input from its own BERT layer and the BERT layer of the event type classification model through horizontal connection. The features are transferred between the two BERT layers through connection. The specific formula is as follows:
where is the weight matrix of the -th layer BERT of the element extraction model, where is the dimension of each token, is the input length on the event element extraction model. is the horizontal connection from the event type classification model -th layer BERT to the -th layer BERT of the element extraction model, and is the input of the -th layer BERT of the element extraction model. The function is to make all the input of the BERT middle layer is non-negative. Through the parameter inheritance mechanism, word features trained by BERT of the event type classification model are transferred to the event element extraction model. It makes the latter model inherits prior knowledge of the former model without damaging the original task sequence.
|Model||Event Type Classification||Event Trigger Identification|
|Chen et al. (DBLP:journals/corr/abs-1912-01586)||66.70||74.70||70.50||68.90||77.30||72.90|
|Du et al. (DBLP:conf/emnlp/DuC20)||71.12||73.70||72.39||74.29||77.42||75.82|
|Associated event extraction (AEE)||83.34||81.96||82.37||86.90||85.68||86.71|
|Model||Event Argument Identification||Argument Role Classification|
|Chen et al. (DBLP:journals/corr/abs-1912-01586)||44.90||41.20||43.00||44.30||40.70||42.40|
|Du et al. (DBLP:conf/emnlp/DuC20)||58.9||52.08||55.29||56.77||50.24||53.31|
|Associated event extraction (AEE)||74.87||65.24||70.35||67.82||55.66||61.42|
3.2.3 Joint Loss
For the element extraction sub-task, we design the joint loss function to be the weighted sum of three losses, including the starting position prediction loss, the ending position prediction loss, and the event type classification loss . The loss function of starting and ending position detection are denoted as . is the cross entropy, is the element role set, and is the input sentence, and the formula of loss can be obtained as:
The overall loss function for optimizing the element extraction model is:
where are hyper-parameters to balance the event classification loss and element role extraction loss.
4 Datasets and Settings
The proposed extraction method, along with the compared approaches, is tested on Automatic Content Extraction (ACE) 2005 DBLP:conf/lrec/DoddingtonMPRSW04. It contains 599 documents, annotated with eight event types, 33 event subtypes, and 35 argument roles. The ACE dataset is divided into training, validation, and test sets at an 8:1:1 ratio. We implement our model based on BERT DBLP:conf/naacl/DevlinCLT19, which has 12 layers, 768-dimensional hidden embeddings, 12 attention heads. The initial learning rate is tuned in for BERT parameters and for other parameters. The maximum sequence length is 512, and the learning rate is
with an Adam optimizer, and the maximum gradient norm for gradient clipping is set to 1.0. The model is trained for 15 epochs with a batch size of 8, and the max training epoch is set to 20. The optimal hyperparameters are tuned on the validation set by grid search, and we tried each hyperparameter five times. We evaluate the performance of our model and comparison models for event classification (EC), trigger identification (TI), argument identification (AI), and argument role classification (ARC) sub-tasks. The evaluation metrics including precision (P), recall (R), and F1.
Comparisons. We compare our extraction method with eight event extraction methods: DBRNN DBLP:conf/aaai/ShaQCS18 leverages dependency bridges by tree structure to carry syntactically related information when modeling each word. JMEE DBLP:conf/emnlp/LiuLH18 introduces attention-based GCN to model graph information based on syntactic structure. Joint3EE DBLP:conf/aaai/NguyenN19
is a multi-task model that performs entity recognition, trigger detection, and argument role assignment by shared Bi-GRU hidden representations.GAIL-ELMO DBLP:journals/dint/ZhangJS19
is an ELMo-based model that utilizes a generative adversarial network to focus on harder-to-detect events.PLMEE DBLP:conf/acl/YangFQKL19 is a BERT-based event extraction method using a pipeline manner, completing trigger identification, event classification, and argument extraction sub-tasks successively. Du et al. DBLP:conf/emnlp/DuC20 design a question answering method, which is expediently implementing data enhancement by constructing multiple questions for a single argument. Chen et al. DBLP:journals/corr/abs-1912-01586 use bleached statements to give a model access to the information contained in annotation manuals. MQAEE DBLP:conf/emnlp/LiPCWPLZ20 is a multi-turn question answering method utilizing argument relationships in the same event type by introducing a history answer.
5 Experiments and Results
5.1 Experimental Results on ACE 2005
Table 2 reports the performance of our model on three evaluation metrics of event type classification and event trigger identification sub-tasks. Compared to Du et al. DBLP:conf/emnlp/DuC20, MQAEE DBLP:conf/emnlp/LiPCWPLZ20 and PLMEE DBLP:conf/acl/YangFQKL19, our model boosts the F1-score by , , and on EC, respectively. It shows that our method is significantly superior to the MRC methods, which ignores the association between the event types. PLMEE DBLP:conf/acl/YangFQKL19 first identifies event triggers and then judges event type according to the triggers, which will cause error information transmission. Du et al. DBLP:conf/emnlp/DuC20 and MQAEE DBLP:conf/emnlp/LiPCWPLZ20 directly judges event type according to the text, which will lead to the lack of event trigger knowledge, resulting in the decline of event type classification performance. However, in our model, we establish the association relation among event types, which can make up for the information missing caused by the lack of event trigger information.
Table 3 shows the performance on event argument identification and argument role classification sub-tasks. Compared with the Joint3EE DBLP:conf/aaai/NguyenN19, the improvement of our model is and F1-score on the two sub-tasks. Our method can use the knowledge of event classification to extract event arguments, but it does not directly define the extracted argument role according to the result of event classification. It shows that our method can improve the performance of argument extraction while avoiding error information transmission. Compared to MQAEE DBLP:conf/emnlp/LiPCWPLZ20, our model improves F1-score on AI and on ARC. It shows that our method is significantly superior to the multi-turn MRC methods, which only utilize relationships among arguments in a type. Our model also consistently outperforms PLMEE DBLP:conf/acl/YangFQKL19, the best-performing baseline model that without involving external knowledge. It boosts the F1-score by , and on AI and ARC, respectively. The results show the importance of exploiting the relation among event types and element roles.
As shown in Table 2 and Table 3, our model achieves state-of-the-art performance on all four sub-tasks evaluated by P and F1. It indicates that our approach delivers the best overall results by utilizing argument relations cross event type and knowledge inheritance from event classification sub-task. Our approach delivers higher precision and F1 than other approaches and tends to have higher R than P, and suffers from low P than prior work. Our approach gives a lower R than the best-performing baseline model on the AI sub-task, but the resulting R is less than the best one .
5.2 Impact on Different Event Type
As shown in Figure 4, we perform all types on event classification sub-task. The red line is the variation of sample numbers across different event types, and the number has a long tail distribution. On all event types, our model can get at least on precision score except for type Ownership and Fine. The precision less than on PLMEE has seven types that are all in types with smaller samples. Our model can obtain a more steady performance cross event type compared to PLMEE. It can improve the performance of event types with only a small number of samples. It demonstrates that our event classification model achieves cross-type knowledge sharing by introducing the DGAT.
5.3 Impact on Different Argument Roles
Our model considers the correlation of arguments among different event types to realize knowledge sharing among different event types by constructing a universal event extraction schema. As shown in Figure 5, we perform the argument roles on our model and our model of removing the Parameters Inheritance (PI) module and Universal Schema (US) module. When the PI and US modules are not introduced, our model performance degrades as the sample number of event argument roles decreases. It boosts performance on event argument roles with a little sample when we add the PI and US. Our model can get at least on P. It can obtain a steady performance cross-type and improve the performance of event argument extraction in some event types with only a little samples. It achieves cross-type knowledge sharing by constructing a universal schema with the parameter inheritance mechanism.
|Tasks||Event Type Classification||Event Trigger Identification|
|-Document-level Graph Attention network (DGAT)||80.34||79.50||80.37||86.21||84.19||86.33|
|-Parameters Inheritance (PI)||-||-||-||86.01||83.05||85.27|
|-PI-Universal Schema (PI-US)||-||-||-||85.21||82.43||84.10|
|-Joint Loss (JL)||-||-||-||86.93||84.62||86.56|
|Tasks||Event Argument Identification||Argument Role Classification|
|-Document-level Graph Attention network (DGAT)||72.02||65.19||70.01||67.92||55.25||61.18|
|-Parameters Inheritance (PI)||71.92||64.61||68.97||67.32||53.87||60.07|
|-PI-Universal Schema (PI-US)||70.44||63.45||67.31||65.83||51.75||59.12|
|-Joint Loss (JL)||73.41||65.13||70.12||67.20||54.83||61.22|
5.4 Impact of DGAT Layer Number
We test the performance on event classification, trigger identification, argument identification, and argument role classification sub-tasks and observe the F1-score of our model with different DGAT layers. As shown in Fig 6, it presents empirical evidence to demonstrate the layer effect on our document-level graph. We observe that F1-score enhances when DGAT layer increases. It shows that DGAT can capture contextual information at the document level, capturing more global knowledge than sentence-level. DGAT is conducive to event extraction on limited data, which requires more DGAT layers to maintain the best performance. However, the improvement reaches a peak when using 5 DGAT layers across all sub-tasks. A further increase in the number of DGAT layers does not give a further advance. Therefore, we finally choose to use 5 DGAT layers.
5.5 Ablation Study
We evaluate four variants of our approach given in Table 4 and Table 5. We remove the Document-level Graph Attention network (DGAT) module. It contributes F1-score of EC in terms of , TI in terms of , AI in terms of , and ARC in terms of . F1-score decreases on all four sub-tasks, which shows that the DGAT module can significantly improve the performance of EC subtasks. It positively affects learning global knowledge and EC knowledge is helpful to element extraction. We further remove the PI module, improving element extraction by using knowledge of event classification. The descending on TI, AI, and ARC sub-tasks is , and on F1-score, leading to performance changing significantly. It may prove that inheriting the knowledge of event classification by the PI module can effectively improve element extraction. When it comes to the US module, we need to remove the PI module meanwhile. F1-score decreases , , and compared to the AEE, respectively. It may prove that our US module can learn more argument roles relation. Moreover, the Joint Loss (JL) module is employed for argument extraction to comprehensively consider the loss of EC and argument extraction. It contributes F1-score of TI in terms of , AI in terms of and ARC in terms of . The results suggest that all variants are helpful, and the PI and US module is the most important for element extraction, as removing it can result in the most drastic performance degradation.
5.6 Case Study
Our event extraction framework makes use of the relationship between event types and elements across types. It can recognize the element more completely by learning the relationship of element roles across event types. For example, in the sentence “As the soldiers approached, the man […]", the word “the man" can play different roles. When the event type is Die, its role is Agent, while when the event type is Attack, its role becomes Attacker. In our universal schema, the two-argument roles are no longer distinguished and are unified as Initiator. Furthermore, it can reduce performance fluctuates caused by uneven data distribution under different element roles. For example, the role Price has 12 samples, the smallest one. The role is changed to Currency with 211 samples by combine role Price and Money. Thus, we can recognize more the role Price than ever. However, our approach does not entirely make the performance of all types equal. It fails to identify arguments of scattered distribution containing multiple events. For example, the sentence “Some 70 people were arrested Saturday […]" contains four types all having the same role Time-Within being “Saturday", which our model cannot identify all “Saturday". Our future work will look into this.
6 Conclusion and Future Works
We propose AEE to learn shared information among event types and argument roles. We propose a document-level graph attention network to establish type information sharing through connecting type nodes in event types. We design a new schema on ACE 2005 for constructing associations among argument roles on the element extraction task. The schema couples highly related argument roles into the same one and afterwards preserves all roles in one universal pattern for all event types. Furthermore, we design a parameter inheritance mechanism to consider event type knowledge for element extraction. Our method solves the problem that the performance irregular compared between classes causing by the long tail distribution of data. Future work considers using reinforcement learning to learn the best extraction order of elements.