Preprint version. Emotion cause analysis such as emotion cause extraction (ECE) and emotion-cause pair extraction (ECPE) have gradually attracted the attention of many researchers, can be constructive to guide the direction of future work, i.e., improving the quality of products or services according to the emotion causes of comments provided by users.
Emotion cause extraction (ECE) was first proposed by lee2010text, which aims at discovering the potential cause clauses behind a certain emotion expression in the text. Earlier work viewed ECE as a trigger word detection problem and tries to solve it with corresponding tagging techniques. Therefore, primary efforts have been made on discovering refined linguistic features [2, 8], yielding improved performance. More recently, instead of concentrating on word-level cause detection, clause-level extraction  was putted forward in that the impact of individual words in a clause can span over the whole sequence in the clause. While ECE has attracted an increasing attention due to its theoretical and practical significance, it requires that the emotion expression annotations should be given in the test set. In light of recent advances in multi-task learning, chen2018joint proposed joint extraction of emotion categories and causes are investigated to exploit the mutual information between two correlated tasks, and xia2019emotion proposed emotion-cause pair extraction (ECPE) task, which aims to extract all potential clause-pairs of emotion expression and corresponding cause in a document, and to solve the shortcomings of previous ECE task must be annotated before extraction causes. xia2019emotion argues that, while co-extraction of emotion expression and causes are important, ECPE is a more challenging problem that is worth putting more emphases on.
However, ECPE still suffers from two shortcomings: 1) In most cases, emotion expression and cause are not the whole clause, but the span in the clause, so extracting the clause-pair rather than the span-pair greatly limits its applications in real-world scenarios; 2) It is not enough to extract the emotion expression clause without identifying the emotion categories, the presence of emotion clause does not necessarily convey emotional information due to different possible causes such as negative polarity, sense ambiguity or rhetoric. For example, “It feels like the sky is falling right on top of me” is an emotion expression of “fear”.
In this paper, we propose a new task: Emotion-Cause Span-Pair extraction and classification (ECSP), which aims to extract the potential span-pair of emotion and corresponding causes in a document, and make emotion classification for each pair. Therefore, ECE and ECPE can be regarded as two special cases of ECSP at the clause-level. Figure 1 is an intuitive example of the difference between the ECE, ECPE and new ECSP task.
Inspired by recent span-based models in syntactic parsing and co-reference resolution [7, 14], we propose a span-based model to solve this new ECSP task. The key insight is to annotate each emotion and cause with its span boundary followed by its emotion categories. Under such annotation, we introduce a span-based extract-then-classify (ETC) model that emotion and cause are directly extracted and paired from the document under the supervision of target span boundaries, and corresponding categories are then classified using their pair representations and localized context. The advantage of this method is that clause-based tasks and span-based tasks can be interpreted uniformly. Moreover, since the polarity is decided by using the targeted span representation, the model is able to take all target words into account before making predictions, thus naturally avoiding sentiment inconsistency.
We take BERT  as the default backbone network, and explore the following two aspects. First, we explore the feasibility of the ECSP task under different length search schemes, and the results prove that the ECSP task can be solved well with the increase of the model search length, and there is still some room for improvement. Second, following previous works [5, 16], we compare our proposed ETC model and strong baselines under the clause-based search scheme. our proposed ETC model outperforms the SOTA model of ECE and ECPE task respectively and gets a fair-enough results on ECSP task. This proves the feasibility of the ECSP task and the effectiveness of our proposed ETC model.
2 Proposed model
Instead of traditional clause-based detection methods to identify emotions and causes, we propose use a span-based search scheme as follows: give an input document with length , and a emotion-cause span-pair list , where the number of emotion-cause span-pair is and each emotion expression span and corresponding cause span in pair is annotated with its START position, its END position, and its emotion category. Span is defined by all the tokens from START() to END() inclusive, for 1 N.
Our goal is to find all potential span-pair of emotion and corresponding causes in a document, and make emotion classification for each pair. The overall illustration of the proposed ETC model is shown in Figure 2. The basis of our proposed ETC model is the BERT encoder , we map word embeddings into contextualized token representations using pre-trained Transformer blocks . A span classifier is first used to propose multiple candidate targets from the sentence. Then, an emotion classifier is designed to predict the emotion labels towards each extracted candidate span-pair using its summarized span representation and and localized context. We further study the performance of different span search schemes.
2.1 Span Representation
As mentioned before, we first obtain the features of tokens with BERT, which utilizes the abundant language knowledge, position information, and contextual information it contains. Given a document where
is the number of words, BERT begins by converting then sequence of tokens into a sequence of vectors, . Each of these vectors is the sum of a token embedding, a positional embedding that represents the position of the token in the sequence, and a segment embedding that represents whether the token is in the source text or the auxiliary text. We only have source text so the segment embeddings are the same for all tokens. Then several Transformer  layers are applied to get the final representations:
We use the final hidden output of BERT as the representations of corresponding tokens.
Attention mechanism  can quickly extract important features of sparse data, so it is widely used in natural language. However, the BERT encoder uses a lot of attention mechanism, in order to save resources, it is no longer used. We use the following two convenient functions to create task-specific span features: (1) sum of all vectors for the entire span can usually represent the its semantics. (2) max pooling is a sample-based discretization process, which the objective is to down-sample an input representation (image, text, hidden-layer output matrix, etc.). For each span , its span representation was defined as:
where represents the final hidden output of BERT global context information, which is usually represented by the vector of the first token in BERT. encodes the length of span in number of tokens. Each component of is a span-specific feature that would be difficult to define and use in token-level models.
2.2 Jointly Extract Emotion and Cause
After obtaining span representation, we predict the type for each span. This prediction is done identically and parallelly for each span. For each span we compute a vector of type scores and apply the softmax function to its type score vector to obtain the distribution. For span ,
where and are parameters that can be learned.
The predicted type for each span is the type corresponding to span ’s highest span type score. Only spans whose predicted type is not none are selected.
2.3 Emotion-Cause Classification
Finally, we obtain a set of emotion expression spans and a set of cause spans . Now our goal is then to pair the two sets and construct a set of emotion-cause span-pairs with emotion relationship. Firstly, we apply a Cartesian product to and , and obtain the set of all possible span-pairs:
Despite advances in detecting long distance relations using BERT or the attention mechanism, the noise induced with increasing context remains a challenge. By using a Localized Context (LC), i.e. the context between span candidates, the emotion classifier can focus on the sentence’s section that is often most discriminative for the emotion type:
where and are the representations of the emotion expression span and corresponding cause span respectively, is localized context between and , and represents the distances (dist) between span and span .
For each emotion-cause span-pair , we obtain a representation by concatenating the respective span embeddings and Localized Context features. Finally, we train a softmax classifier to identify emotion categories:
where and are parameters that can be learned.
2.4 Loss Function
We evaluate on the benchmark ECE corpus111Available at: http://www.hitsz-hlt.com/?page_id=694 , which was the mostly used corpus for emotion cause extraction. The corpus includes annotations of emotional expressions and corresponding emotional causes. We use the boundary of the annotations as the start and end of the spans. Note that the presence of emotion expression does not necessarily convey emotional information due to different possible causes such as negative polarity, sense ambiguity or rhetoric. And, the presence of emotion expression does not necessarily guarantee the existence of emotional cause neither. Therefore, for each emotion expression, we also use the emotion labels provided by the corpus. There are different lengths for each emotion expression and cause, and the number is shown in Table 1.
|Item||Annotations||Length 2||Length 5||Length 10||Length 15||Length 20|
The precision (P), recall (R), and F1 score are used as the metrics for evaluation. These metrics in emotion cause extraction are defined by:
where denotes the number of items that are predicted, denotes the number of items that in corpus and the means the number of items that are correctly predicted. Unlike previous research on clause, a correct item is considered to be correct only if both the start and end of the item are correctly predicted in the new ECSP task.
3.3 Experimental Settings
We use the BERT-Chinese222Available at: https://github.com/huggingface/transformers model as the default backbone network, which using 12 layers, 768 -dimensional embeddings, 12 heads per layer, resulting in a total 110M parameters. Each span gets a span length feature which is a learned 25 -dimensional vector representing the number of tokens in that span and each pair also gets a localized context length feature which is twice as much as . we randomly divide the data with the proportion of 9:1, with 9 folds as training data and remaining 1 fold as testing data. The following results are reported in terms of an average of 10-fold cross-validation. We use Adam optimizer with a linear warmup and linear decay learning rate schedule and a peak learning rate of 5e-5. Dropout is applied with dropout rate 0.1 to all hidden layers of BERT and Classifiers. Mini-batch Size is 1 and early stopping of 20 evaluations on the dev set is used.
3.4 Evaluation on the New ECSP Task
3.4.1 Overall Performance
Table 2 shows our proposed ETC model performances with different span lengths on four sub-tasks: (emotion expression span extraction (EESE), emotion cause span extraction (ECSE), emotion-cause span-pair extraction (ECSPE), and emotion-cause span-pair extraction and classification (ECSP)).
Given a document with a token, there may be spans. The huge search space makes the task extremely challenging. In this experiment, we created a length-restricted span (rather than just token) representation that achieves a dual goal: to improve memory efficiency and capture the majority (more than 98% of emotions, see Table 1) for the span considered.
Compared with ETC-5 and ETC-15, ETC-20 gets great improvements on the ECSP task as well as the two sub-tasks. Specifically, we find that the improvements are mainly in the recall rate on the ECSE task, which finally lead to the great improvement in the recall rate of ECSP. The performance of the model does not decrease sharply as the length of the annotation increases, and our chosen span search scheme is far more memory efficient than a naive search over all possible spans in the input document. Yet our scheme still considers more than 98% of all annotation. Our scheme is linear in the document length, not quadratic; because we limit our proposed ETC model to spans that are wholly in a document and have a max length of = 20 tokens.
In addition, the model achieved excellent F1 score 88.71 on ESE, but the F1 score on ECSP is 3.14% lower than ECSPE, which indicates that it is not enough to extract the emotion without identifying the emotion categories. The presence of emotion clause does not necessarily convey emotional information explicitly, and emotions need to be classified.
3.4.2 Effect of Localized Context
As is shown in Table 3, localized context can effective slightly improve the performance of the model. The localized context takes advantage of all information between two span, so it is able to enrich the source information when the model predicts the emotion labels, which leads to the performance of the model effective significantly improved.
3.5 Evaluation on the Traditional Task
By relaxing the ECSP task to the clause-level, we further examine our model by comparing it with state-of-the-art of the traditional ECE and ECPE task.
We employ a hierarchical Bi-LSTM network Indep
proposed by xia2019emotion as baseline in ECPE task. The lower layer consists of a set of word-level Bi-LSTM modules, each of which corresponds to one clause, and accumulate the context information for each word of the clause. Attention mechanism is then adopt to get a clause representation. The upper layer consists of two components: one for emotion expression extraction and another for cause extraction. Each component is a clause-level Bi-LSTM which receives the independent clause representations and finally feed to the softmax layer for emotion prediction and cause predication. It has two interactive variants:Inter-CE, where the predictions of cause extraction are used to improve emotion extraction, and Inter-EC, where the predictions of emotion extraction are used to enhance cause extraction.
In addition to baselines mentioned above, we also considered several state-of-the-art methods and models in ECE task that need to provide annotations of emotional expressions in the test set in advance to evaluate the results of our proposed ETC model: RB is a rule based method ; CB is common-sense based method ; ConvMS-Memnet considers emotion cause analysis as a reading comprehension task and designs a multiple-slot deep memory network to model context information . CANN
uses a co-attention neural network to identify emotion causes and CANN-E eliminates the dependence of CANN on emotion annotation in the test data. HCS is proposed by yu2019multiple using a multiple-level hierarchical network to detect the emotion causes. MANN is the current state-of-the-art method employing a multi-attention-based model for emotion cause extraction .
3.5.2 Results and Analysis
The past clause-level models regarded the ECE task as a set of independent clause classification problems. By observing the Table 4 (c), we found that the proportions of emotion cause clauses and non-emotion-cause clauses were 18.36% and 81.64%, respectively. It is a serious class-imbalance classification problem and the model tends to predict the clause as non-emotion-cause more often. This is also the reason why their Recall scores were quite low (the highest was 75.87).
By contrast, it can found in Table 4
(c) that our proposed ETC model is absolutely higher on each indicator than the other baselines and no need to manually annotate the test set. This is because they can capture the relations of multiple clauses which help inferring the current clause. For example, if no other clauses in a document have been detected as an emotion cause, the model will increase the probability of the current clause being predicted as an emotion cause. This finally increases the Recall score. It is clear that by removing the emotion annotations (CANN-E), the F1 score of CANN drops dramatically (about 34.69%). In contrast, our method does not need the emotion annotations and achieve 89.57% in F1 score, which significantly outperforms the CANN-E model by 51.6%.
xia2019emotion guessed that the expression clause extraction and cause clause extraction are not mutually independent. On the one hand, providing emotions can help better discover the causes; on the other hand, knowing causes may also help more accurately extract emotions. Our proposed ETC model uses a classifier to complete the classification of expression and cause, forcing the classifier to learn the intrinsic relationship between them. Thanks to BERT’s self-attention mechanism, our proposed ETC model can capture the relationship between multiple clauses. It can found in Table 4 (b) that our proposed ETC model has been greatly improved on both expression clause extraction and cause clause extraction tasks. Compared with Indep, Inter-CE and Inter-EC, our proposed ETC model gets great improvements on the ECPE task as well as the two sub-tasks. Our span-based model achieves 11.57%, 24.77% and 24.5% absolute gains on three sub-task compared to the best classification model, indicating the efficacy of our proposed ETC model.
4 Related Work
First of all, our work is related to extracting causes based on emotions expression presented in documents, i.e., emotion cause extraction (ECE). ECE was first proposed by lee2010text, given the fact that an emotion is often triggered by cause events and that cause events are integral parts of emotion, they proposed a linguistic-driven rule-based system for emotion cause detection. To solve the insufficient of no formal definition about event in emotion cause extraction and there was no open corpus available for emotion cause extraction, gui2016event released a corpus and re-formalized the ECE task as a clause classification problem. This corpus has received much attention in the following study and has become a benchmark corpus for ECE task research. Based on this corpus, several traditional rule-based models[9, 12, 6]
, machine learning models[5, 4, 18]
and deep learning models[4, 11, 19, 17, 10] were proposed. Recently, To solve the shortcoming of emotion expression must be annotated before cause extraction in the test set, xia2019emotion proposed emotion-cause pair extraction (ECPE) task, which aims to extract all potential clause-pairs of emotion expression and corresponding cause in a document.
5 Conclusions and Future Work
The key idea of task and model is to build span-based feature representation for emotion expression and causes to efficiently extract document information. Furthermore, our proposed ETC model is able to utilize the information based on an overall understanding of the document and a better localized context of interactions between spans. Comprehensive empirical studies demonstrate the effectiveness of our proposed ETC model. Since our proposed ETC model has a single input structure, so in the future we will explore how to incorporate discourse graphs into our proposed ETC model to further improve performance, and we intend to annotate a large-scale emotion-cause span-pair corpus to facilitate research.
-  (2015) Neural machine translation by jointly learning to align and translate. In 3rd International Conference on Learning Representations, ICLR 2015, Cited by: §2.1.
-  (2010) Emotion cause detection with linguistic constructions. In Proceedings of the 23rd International Conference on Computational Linguistics, pp. 179–187. Cited by: §1.
-  (2019) BERT: pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp. 4171–4186. Cited by: §1, §2.
A question answering approach for emotion cause extraction.
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pp. 1593–1602. Cited by: §3.5.1, §4.
-  (2016) Event-driven emotion cause extraction with corpus construction.. In EMNLP, pp. 1639–1649. Cited by: §1, §1, §3.1, §4.
-  (2014) Emotion cause detection with linguistic construction in chinese weibo text. In Natural Language Processing and Chinese Computing, pp. 457–464. Cited by: §4.
-  (2017) End-to-end neural coreference resolution. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pp. 188–197. Cited by: §1.
-  (2013) DETECTING emotion causes with a linguistic rule-based approach 1. Computational Intelligence 29 (3), pp. 390–416. Cited by: §1.
-  (2010) A text-driven rule-based system for emotion cause detection. In Proceedings of the NAACL HLT 2010 Workshop on Computational Approaches to Analysis and Generation of Emotion in Text, pp. 45–53. Cited by: §3.5.1, §4.
-  (2019) Context-aware emotion cause analysis with multi-attention-based neural network. Knowledge-Based Systems 174, pp. 205–218. Cited by: §3.5.1, §4.
-  (2018) A co-attention neural network model for emotion cause analysis with emotional context awareness. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 4752–4757. Cited by: §3.5.1, §4.
Emocause: an easy-adaptable approach to emotion cause contexts.
Proceedings of the 2nd Workshop on Computational Approaches to Subjectivity and Sentiment Analysis, pp. 153–160. Cited by: §3.5.1, §4.
Axiomatic derivation of the principle of maximum entropy and the principle of minimum cross-entropy. IEEE Transactions on information theory 26 (1), pp. 26–37. Cited by: §2.4.
-  (2017) A minimal span-based neural constituency parser. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 818–827. Cited by: §1.
-  (2017) Attention is all you need. In Advances in neural information processing systems, pp. 5998–6008. Cited by: §2.1, §2.
-  (2019) Emotion-cause pair extraction: a new task to emotion analysis in texts. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 1003–1012. Cited by: §1.
RTHN: a rnn-transformer hierarchical network for emotion cause extraction.
Proceedings of the 28th International Joint Conference on Artificial Intelligence, pp. 5285–5291. Cited by: §4.
-  (2017) An ensemble approach for emotion cause detection with event extraction and multi-kernel svms. Tsinghua Science and Technology 22 (6), pp. 646–659. Cited by: §4.
-  (2019) Multiple level hierarchical network-based clause selection for emotion cause extraction. IEEE Access 7, pp. 9071–9079. Cited by: §4.