# Writing Style Aware Document-level Event Extraction

Event extraction, the technology that aims to automatically get the structural information from documents, has attracted more and more attention in many fields. Most existing works discuss this issue with the token-level multi-label classification framework by distinguishing the tokens as different roles while ignoring the writing styles of documents. The writing style is a special way of content organizing for documents and it is relative fixed in documents with a special field (e.g. financial, medical documents, etc.). We argue that the writing style contains important clues for judging the roles for tokens and the ignorance of such patterns might lead to the performance degradation for the existing works. To this end, we model the writing style in documents as a distribution of argument roles, i.e., Role-Rank Distribution, and propose an event extraction model with the Role-Rank Distribution based Supervision Mechanism to capture this pattern through the supervised training process of an event extraction task. We compare our model with state-of-the-art methods on several real-world datasets. The empirical results show that our approach outperforms other alternatives with the captured patterns. This verifies the writing style contains valuable information that could improve the performance of the event extraction task.

## Authors

• 14 publications
• 184 publications
• 15 publications
• 13 publications
10/13/2020

### Cross-Supervised Joint-Event-Extraction with Heterogeneous Information Networks

Joint-event-extraction, which extracts structural information (i.e., ent...
08/23/2021

### Event Extraction by Associating Event Types and Argument Roles

Event extraction (EE), which acquires structural event knowledge from te...
03/25/2021

### A Machine Learning Pipeline for Automatic Extraction of Statistic Reports and Experimental Conditions from Scientific Papers

A common writing style for statistical results are the recommendations o...
05/13/2020

### Document-Level Event Role Filler Extraction using Multi-Granularity Contextualized Encoding

Few works in the literature of event extraction have gone beyond individ...
11/07/2021

### Information Extraction from Visually Rich Documents with Font Style Embeddings

Information extraction (IE) from documents is an intensive area of resea...
08/23/2021

### Using Neighborhood Context to Improve Information Extraction from Visual Documents Captured on Mobile Phones

Information Extraction from visual documents enables convenient and inte...
09/12/2019

### Style-aware Neural Model with Application in Authorship Attribution

Writing style is a combination of consistent decisions associated with a...
##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## Research Highlights (Required)

To create your highlights, please type the highlights against each \item command.

It should be short collection of bullet points that convey the core findings of the article. It should include 3 to 5 bullet points (maximum 85 characters, including spaces, per bullet point.) We formalize the writing style in documents as the Role-Rank Distribution between the argument roles, positions, and the event triggers. We propose a new document-level event extraction model based on our defined Role-Rank Distribution. We capture the Role-Rank Distribution by training our model with the event extraction task. We verifies the captured Role-Rank Distribution contains valuable information that could improve the performance of the event extraction task.

## 1 Introduction

Event extraction (EE) (1) is an important mission to get event factors (22), including triggers (17) and arguments (18), from real-world corpora. Usually, the event triggers (17) are mentions that express the causes and types for events and the event arguments are named entities (e.g. person names, company names, and locations) (2) that play critical roles in an event. As a method to analyze texts, the EE system is wildly used in applications like Information Extraction (3), Question Answer (4), etc. The mainstream works (8; 9)

divide into sentence-level and document-level event extraction according to the different scopes of the inputs. Further, since it is a combinatorial optimization problem to align entity labels with tokens, many existing works

(9; 6; 22) try to include several potential features to find a good mapping for tokens and their labels. Literature (9; 6) leverage semantic features in their models. And some methods (22)

adopt pre-defined information or pre-trained data to improve the performance of their models. The aforementioned works could be considered as the method to reform the token-level multi-label classifiers which distinguish the tokens in sentences as different roles.

Recently, researchers and practitioners apply event extraction technologies to aid the studies in many fields (6; 35; 36). For instance,  37 apply the event extraction method to enhance finance-related research performed by financial analysts.  36 apply the event extraction results from medical reports to make related data easily accessible and improve medical decisions. These studies show that high-quality event extraction results have the potential to boost studies in special fields. However, since the contents of the documents in special fields (e.g. finance, medical, etc.) organize relatively fixed, the quality of the event extraction results might be degraded by applying a general model. This has a negative influence on the development of consequential studies.

The writing style (11) describes a special way to organize the contents for a document in a specific field. It is widely used in different fields such as recognizing stylistic deception (23), predicting the success of literary works (24), etc. In order to incorporate the writing style in the event extraction task, we formalize it as, Role-Rank Distribution (RRD) distribution, a distribution that describes the roles and positions of the named entities in a document. In our experiment (cf. Fig. 1), we observe that the RRD distributions retain relatively fixed for documents of specific event types in the financial corpus CFA (Chinese Financial Announcement  (10)). This indicates that the writing style have a strong connection with the event factors in the financial documents with given event types and it might has the potential to help the performance of the event extraction task. However, as far as we know, none of the existing works consider this special pattern or how to leverage it in their models.

To this end, we leverage the RRD distribution in a proposed joint event extraction model. Since our model considers the relationships among the roles, positions for arguments and the corresponding event triggers in the given documents, it improves the performance of extracting the named entities and the event triggers simultaneously.

We test our model on real-world datasets and our model excels the state-of-the-art event extraction works with the same training set. This indicates that our method captures useful writing styles in helping the event extraction task in specific fields.

In summary, the main contributions of our work are follows:

• To our knowledge, we are the first to discuss the relationship between the argument roles, positions and the event triggers in the field-oriented documents (cf. financial documents) and we formalize this relationship as the Role-Rank Distribution (RRD) distribution in the event extraction task.

• We propose a model, the Event Extraction via Field-Role-Rank Distribution (EEFRRD), which extracts event information from documents by leveraging the Role-Rank Distribution (RRD) distribution towards a specific field.

• Our experiments on the financial document datasets show that, by integrating the RRD pattern, our method achieves better performance on the Document-level Event Extraction (DEE) tasks than the other related alternative methods. We also observe that this performance improvement is further significant when dealing with financial documents that have strong RRD patterns.

The remainder of paper is as follows: Section 2 introduces some definitions about event extraction and argument distribution. Section 3 proposes our model and gives the details of the RRD distribution. Section 4 conducts sufficient comparison experiments of our method and other alternatives on several real-world financial datasets. Section 5 concludes this paper.

## 2 Preliminaries

We give the formal definition for the details of the event extraction tasks and the definition of our problem in this section.

### 2.1 Event Extraction

To start with, we provide several key notations in Table 1 by following (9; 10).

To illustrate the aforementioned conceptions, we show a real-world financial event table in Figure 2. Generally, the event extraction is a task to fill the right blanks of an event table with mentions in a document. This event table contains structural information such as event types, event roles, and event arguments.

Recent event extraction works divide into the sentence-level and the document-level methods that differ by different scopes of the input contents.

#### 2.1.1 Sentence-level Event Extraction (SEE).

The Sentence-level Event Extraction models aim at extracting the named entities (16) from sentences. Sequence-to-sequence (Seq2Seq) (15; 29; 30) model is a mainstream method to implement the SEE task. The idea (15) of Seq2Seq SEE models is to translate a sentence (or the token sequence) to a tag sequence in the BIO schema. Given a set of sentences and a set of pre-annotated label sequence ; suppose for an arbitrary sentence (, s are tokens), there is a corresponding label sequence (, ); Then the task of Seq2Seq SEE is to find an optimal tagger function

that minimizes the following loss function:

 L1=∑∀i∈[1,Nw]∑∀y∈Y−pilog(^pi) (1)

where is the probability to annotate a token as the tag by , and is the probability of an oracle model to annotate the same token as the tag .

As an auto-annotation method, SEE models (29) extract event arguments within the scope of a sentence, and some (32) reach excellent performance. However, in a real-world document, the factors (arguments, roles, event type, etc.) of an event may scatter in the whole document rather than within a single sentence. This raises the argument-scattering issue (9) of how to get the optimal scattered factors of an event for the current SEE methods.

#### 2.1.2 Document-level Event Extraction.

Document-level Event Extraction (DEE) (10) extends the scopes of the input contexts from sentences to documents. This gets more candidate arguments with roles and thus could generate a complete event table (cf. Figure 2) from a document. Recent works (9; 10; 31) have explored to build models on the document level. 9

explore to construct DEE framework on two stages: tagging sequence by SEE and utilizing multi-sentence to pad missing information.

10 try to add generate an entity-based directed acyclic graph to fulfill DEE process. Some efforts try to add contextual features, such as syntactic features (21) to help identify event types from the texts.

### 2.2 Problem Formulation

Our task is to extract the complete structural information (arguments, roles, event type, etc.) of an event from a document. We formalize this document-level joint event extraction process as follows.

Given a document (), where s are sentences. Then our target is to find an optimal event table with the following method.

 T′=⟨E(d),⋃si∈df(si)⟩, (2)

where is a function to identify the type of event based on the input of .

As is discussed in the Introduction, we observe that a significant difference in the writing styles for documents of various event types. However, this pattern is ignored by most of the existing works. We try to leverage this principle to improve the performance of our task in the consequential sections.

## 3 Our Proposed Model

### 3.1 Overview of Our Model

Figure 3

shows the main framework of our model. First, it extracts all the arguments with a sentence-level labeling module. Then, it utilizes a neural network classifier to distinguish the event type. After the aforementioned two processes, our system obtains the candidate event records (a set of event records). We generate the target event table by filtering the obtained event records based on the distribution of roles to the arguments in documents of different event types. To this end, we formalize the writing style as the Role-Rank Distribution and propose the Role-Rank Distribution based Supervision Mechanism (RRDSM) to measure the likelihood of a document belonging to a specific event type. This mechanism further improves the performance of our model.

As is illustrated in Figure 3, our argument extraction module applies a BiLSTM-CRF (34) based model to realize argument extraction from sentence sequences. Then, our event recognition module uses a CNN (21)-based model to classify the event types for documents. In the last, we propose a self-attention augmented module by leveraging the proposed Role-Rank Distribution to revise the final event type prediction. We elaborate on the details of all these modules in the following sections.

### 3.2 Role-Rank Distribution

We formalize the writing styles as a statistical distributions about the argument roles, argument positions in sentences and event triggers. To facilitate the discussion, we provide the related notations in the following.

#### 3.2.1 Role-Rank Score.

Given a set of documents , a set of argument roles . Then the Role-Rank Score , is the maximum length of the sentences in ) is a conditional probability for a specific event role that appears in the -th sentences of all the documents in .

In order to capture the Role-Rank Scores for documents with different event types, we extend the it to the following form.

#### 3.2.2 Role-Rank Distribution.

Given a set of events , a set of documents , a set of argument roles . The Role-Rank Distribution is an tensor , where each element of it is a conditional probability of an argument role () appearing in the -th sentence belonging to the documents of event type ().

The Role-Rank Distribution reveals the ternary relationship of the event roles and the order of sentences in documents under different event types. From the observation in the Introduction, this distribution could be rather different with various events. Therefore, it might be potentially useful to predict the event roles by giving a sentence order and an event type. In a real-world corpus, a complete Role-Rank Distribution for a set of document under all the event types could be extremely huge (). This might degrade the efficiency of our prototype system. To this end, we preprocess and storage an Role-Rank Distribution as the tensor . In the following sections, we also use the slice matrix refers to a slice of the Role-Rank Distribution towards a specific event type ().

#### 3.2.3 Preprocessing the Role-Rank Distributions.

In order to reduce the time complexity for our prototype system, we obtain these Role-Rank Distributions by a preprocessing process before the training of our model. With the preprocessed results, the Role-Rank Distributions of the training data could be accessed whenever required by the training process of our model. We describe this process in Algorithm 1.

In our experiment, the preprocessing of Algorithm 1 usually saves 12.8% of the training time of our prototype system by comparing to the original on-line processing.

### 3.3 Role-Rank Distribution Based Supervision Mechanism

To incorporate the preprocessed Role-Rank Distribution to supervise our model, we propose the Correct-Annotation Likelihood to evaluate the plausibility of a document to a specific event type. The core idea of this method is to analyze the possibility of the output tag sequences from a Seq2Seq SEE module by referring the preprocessed Role-Rank Distribution of the training set. This requires a Role-Rank Distribution of the Seq2Seq SEE outputs. However, since the Seq2Seq SEE only outputs tag sequences, there is a gap between the result of the Seq2Seq SEE module and the preprocessed Role-Rank Distribution of the training set. To this end, we propose the Tag-to-Role Transition generate the required Role-Rank Distribution from the Seq2Seq SEE results.

#### 3.3.1 Tag-to-Role Transition

Given a set of documents , a set of tags , and a set of event roles ; suppose is a sentence in document (), is the corresponding tag sequence for . Then the Tag-to-Role Transition is an matrix, where each of its element is the frequency that a tag belongs to role in .

With a set of tag sequences outputted from the Seq2Seq SEE module, the Tag-to-Role Transition helps to transform the distribution of the tags to the sentence order to a role-rank distribution. Formally, a role-rank distribution can be obtained through the transformation in Equation 3.

 P′=Py(d)⋅W, (3)

where is an matrix that refers to the distribution between the tags and the sentence orders in document . With the obtained Role-Rank Distribution from the annotated result of Seq2Seq SEE module, we define the Correct-Annotation Likelihood as the following.

#### 3.3.2 Correct-Annotation Likelihood.

Let be a tensor of preprocessed Role-Rank Distribution and be the Role-Rank Distribution counted from the annotated result of the SEE module. Suppose is the slice matrices of which is divided by event type. Then the Correct-Annotation Likelihood is computed as the following Equation.

 le=similarity(P′,Pe)=P′⋅Pe||P′||×||Pe||∀e∈E, (4)

where is a specific event type , is the

-norm. Note that, although we use the Cosine similarity in this work, this metric could also be other similarity method.

Our model uses the obtained Correct-Annotation Likelihoods as the weights to revise the event prediction. The experiment shows that this method is effective in improving the performance of the event prediction. However, we further observe deeper potential principles that might lie behind the Role-Rank Distribution in our experiment. Concretely, as is illustrated in Figure 1, we observe that although most of the sentence distributions of different argument roles are significantly different, some of them might rather alike (c.f. the distributions of role “Pledger” and “Repurchase Amount” in Figure 1(b)). To distinguish the roles of the arguments with similar distributions, we further explore the Role-Rank Distribution in different event types and found that their distribution may be significantly different under various event types (c.f. the role “Pledger” in event types “equity pledge” and “equity repurchase ” in Figure 1(a)). In the next section, we propose a self-attention method to leverage this latent principle from the observation to further improve the performance of our model.

#### 3.3.3 Self-attention Augmented Event Identification.

We propose a self-attention based module to further revise the event type prediction results by leveraging the latent difference among the same event types.

Given a document , a set of event types and a set of event roles . Then, the event attention for event () is computed as the following Equation.

 ae=Softmax(P′PTe√|R|)Q, (5)

where is a distribution of event roles to event type and it can be computed as Equation 6.

 Q=P′W′, (6)

where is the transition matrix from event roles to event types. Its element is the frequency that a role belongs to a event type in ; is the Role-Rank Distribution counted from the annotated result of the SEE module; is the slice matrices of the preprocessed .

In order to improve the efficiency in computing the attentions, we concatenate the attention results to form the following tensor.

 A=Linear(ae1⊕ae2⊕...⊕aen) (7)

where s are various events () and the function

is a linear transformation or full connected layer to transform the output as the

column tensor.

### 3.4 Complete task and Optimization

Our complete task consists of two sub-tasks, i.e., the Sentence-level Event Extraction (SEE) and the Event Type Identification. The loss function for the SEE task is defined in Equation 1. The loss function for the Event Type Identification is formalize as the following.

 L2=∑∀d∈D∑∀e∈E−pe,dlog((Softmax([l1,l2,...,l|E|]⋅A⋅V))e), (8)

where is the probability of an oracle model to classify document into the event , is an vector which is consisted of the Correct-Annotation Likelihoods of document toward various event types, is the attention results computed from Equation 7 and represents an event extraction results from document ; Since the result of function in Equation 8 is an vector, is the -th element of the result and it represents the probability of our model to classify document into the event .

By combining the two sub-tasks, our final task is computed as following Equation.

 L=(1−λ)L1+λL2, (9)

where is a parameter to adjust the weight of the event identification task. After optimizing this loss function, our system obtains the tagging results and the event type simultaneously. Therefore, after the training process, our system outputs an approximate optimal event table for the target documents.

## 4 Experiment and Analysis

### 4.1 Dataset

We use the CFA (Chinese Financial Announcement, 2008-2018) compiled by  10 through all the experiments in this work. It is based on the knowledge base of remote supervision and expert summary to mark the text data. The event can be divided into three types in Table 2: Equity Repurchase (ER), Equity Underweight (EU), and Equity Pledge (EP). These text data include major news events that have been disclosed and may have a huge impact on the behavior of companies and investors. The partition information for the dataset is shown below.

This dataset is divided into training set, validating set and testing set in the proportion 8:1:1, which are used to train, validate and test the model.

### 4.2 Comparison Baselines

We compare our model with some baselines under the DEE framework. The related models which adopted this framework are as follows.

• JEE (31) is a model that depends among variables of events, entities, and their relations, and performs joint inference of these variables across a document.

• DCFEE-O is a model that gets one event record from a sentence based on DCFEE (9).

• DCFEE-M is an improved model based on DCFEE, which can get combined event records from multi-sentences.

• Doc2EDAG (10) added an entity-based directed acyclic graph to optimize the structure of DEE.

• EEFRRD-CAL is our completed model with the distribution supervision mechanism which only makes use of Correct-Annotation Likelihood.

• EEFRRD-SAEI is our completed model with RRDSM which adopts Self-attention Augmented Event Identification.

### 4.3 Implementation Details

In the tagging sequence, we adopt the Beginning-Inside-Outside (BIO) annotation schema for the candidate argument set extraction. To compare all methods fairly, we adopt the same BiLSTM module which has 4 hidden layers and 768 hidden dimensions. Moreover, all models are trained through the Adam optimizer. The learning rate is

. During the training, we set the batch size to 32 and the dropout to 0.5. In our model, we set 2 CNN layers where the size of the convolution kernel is 3. The style of pooling is max-pooling to mine the most significant feature. In calculate our loss function, we set

.

To evaluate the performance of our model, we adopt several prevalent metrics, which have been used to evaluate event extraction by comparing the predicted event table. From the predicted event table and the ground true table which belongs to the same event type from one document, we need to compare event records without replacement one by one and calculate true positive, false positive, and false negative (, and for short) statistics until there is no record left. At last, we can calculate precision, recall, and F1 scores (, , for short). The equations are as follows:

 P=TPTP+FP
 R=TPTP+FN
 F1=2⋅P⋅RP+R

### 4.4 Experimental Results

The results of the comparison experiment on the dataset are shown in Table 3. We observe that with the RRDSM mechanism, our model EEFRRD-SAEI gets a better -score than the other DEE models. This verifies that our proposed model indeed improves the performances of document-level event extraction.

In the experiment, it is hard for the model to recognize event types in multi-event documents. Table 4 shows the event classification performance of different models. EEFRRD-SAEI utilizes Role-Rank Distributions to obtain better extraction. As a necessary step of event table generation, this part has an obvious improvement under the supervision of the proposed RRDSM.

### 4.5 Sensitivity Study

We research the influence of the hyper-parameters with EEFRRD-SAEI model on the CFA dataset. We set from 0.50 to 0.66 and test the performances of the proposed model under all the settings. The result is shown in Figure 4, we observe that the performance of our model peaks around 0.6.

### 4.6 Case Study

To figure out the effectiveness of our model, we collect argument distribution from 64 EU documents as typical cases. (see Figure 5) The ”Oracle” is the corresponding ground-truth distribution in datasets. We observe that there are similar trends and shapes in both distribution curves figures of similar argument roles.

## 5 Conclusion

In this paper, we propose a document-level event extraction model to get the complete structural information from documents. Our method focuses on leveraging the writing style from the field-specific documents to improve the performance of the event extraction task. We formalize the writing style as the Role-Rank Distribution (RRD) that describes the relationship between the argument roles and the positions of the arguments in the sentences. Then, our model utilizes the proposed RRD in a self-attention module to revise the final event type prediction task. To further boost the efficiency of our model, we preprocess the RRD of each event type for the training data. The experimental results show that our model excels others on the performances of document-level event extraction tasks. This verifies that the writing style of documents helps to improve the performances to extract the event factors from the documents.