A Dependency Syntactic Knowledge Augmented Interactive Architecture for End-to-End Aspect-based Sentiment Analysis

04/04/2020 ∙ by Yunlong Liang, et al. ∙ BEIJING JIAOTONG UNIVERSITY Tencent 0

The aspect-based sentiment analysis (ABSA) task remains to be a long-standing challenge, which aims to extract the aspect term and then identify its sentiment orientation.In previous approaches, the explicit syntactic structure of a sentence, which reflects the syntax properties of natural language and hence is intuitively crucial for aspect term extraction and sentiment recognition, is typically neglected or insufficiently modeled. In this paper, we thus propose a novel dependency syntactic knowledge augmented interactive architecture with multi-task learning for end-to-end ABSA. This model is capable of fully exploiting the syntactic knowledge (dependency relations and types) by leveraging a well-designed Dependency Relation Embedded Graph Convolutional Network (DreGcn). Additionally, we design a simple yet effective message-passing mechanism to ensure that our model learns from multiple related tasks in a multi-task learning framework. Extensive experimental results on three benchmark datasets demonstrate the effectiveness of our approach, which significantly outperforms existing state-of-the-art methods. Besides, we achieve further improvements by using BERT as an additional feature extractor.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

The aspect-based sentiment analysis (ABSA) is a long-challenging task, which consists of two subtasks: aspect term extraction (AE) and aspect-level sentiment classification (AS). The AE task aims to extract aspect terms from the given text. The goal of the AS task is to detect the sentiment orientation over the extracted aspect terms. For example, in Figure 1, there are two aspect terms mentioned in the sentence, namely, “coffee” and “cosi sandwiches”, towards which the sentiment polarity is positive and negative, respectively.

For the overall ABSA task, previous work has shown that the joint approaches [He et al.2019, Luo et al.2019] can achieve better results than pipeline or integrated methods [Wang et al.2018, Li et al.2019], since the joint approaches can sufficiently model the correlation between the two subtasks, i.e., AE and AS. However, these models are typically insufficient for modeling the syntax information, which reveals internal logical relations between words and thus is intuitively pivotal to the ABSA task. For instance, in Figure 1, 1) given “sandwiches” as a part of an aspect term, “cosi” can also be extracted as a part of the aspect term through the dependency relation type compound with “sandwiches”, and thus constitutes a complete aspect term with “sandwiches”, namely, “cosi sandwiches”; 2) after the aspect term being extracted, the sentiment polarity of “cosi sandwiches

” can be easily classified as

negative due to the opinion word “overpriced”, which is pointed out by the dependency relation: ( means adjectival modifier); 3) for the sentiment orientation of another aspect term, the opinion word “better” indicates that the sentiment polarity of “coffee” is positive through the multi-level dependency relation: . Clearly, for the sentiment orientation of multiple aspects, the model will be not confused if differentiated dependency relation paths are considered appropriately. Therefore, a syntax-independent encoder may not encode such critical relational information into the final representation, which may lead to incorrect predictions.

Figure 1: An example of dependency tree (generated by spaCy [Honnibal and Johnson2015]). For instance, the dependency relation means “sandwiches” is a nominal modifier of “cosi”. The tree can be easily converted into a dependency graph representation where words are regarded as nodes, and dependency relation types become edges.

Recent studies with separate subtask settings have indeed shown that the syntax information can benefit the AE task and the AS task. For instance, for the AE task, Dai and Song dai-song-2019-neural manually design some aspect term extraction patterns based on a few dependency relations, and then construct a large amount of auxiliary data to improve the performance. To enhance the tree-structured representation to improve the AE performance, Luo et al. Luo2019_tree encode the dependency relation as features by using a bidirectional gate control mechanism in dependency trees, which originates from bidirectional LSTM [Hochreiter and Schmidhuber1997]

. For the AS task, the variants of recursive neural network 

[Schlichtkrull et al.2011] or graph convolutional network (GCN, [Kipf and Welling2017]) are exploited to capture the syntactic information from dependency (constituent) tree of the sentence to make the representation of the target aspect richer for more accurate sentiment predictions [Dong et al.2014, nguyen and Shirai2019, huang and carley2019, Zhang et al.2019]. Wang et al. 8561296 utilize a syntax-directed local attention to lay more emphasis on the words syntactically close to the target aspect instead of the position-based ones for more performance gains of the AS task. However, these studies do not simultaneously enhance the two subtasks with syntactic knowledge in a joint framework, which is beneficial to each subtask and thus can improve the overall performance of the ABSA. Additionally, dependency relation types are not sufficiently exploited to improve the overall performance.

Therefore, we propose a dependency syntactic knowledge augmented interactive architecture with multi-task learning, which is able to fully exploit the syntactic knowledge and simultaneously model multiple related tasks. In particular, we design a Dependency Relation Embedded Graph Convolutional Network (DreGcn) to fully model the dependency relation as well as the dependency relation type between words in one sentence. Furthermore, we propose a simple yet more effective message-passing mechanism to ensure that our model learns from multiple different but related tasks.

We evaluate our approach on three benchmark datasets. Experimental results demonstrate the effectiveness of our model, which significantly outperforms existing systems and achieves new state-of-the-art performance. Besides, we provide further improvements by using BERT [Devlin et al.2018] as an additional feature extractor.

Our contributions can be summarized as follows:

  • We propose a novel Dependency Relation Embedded Graph Convolutional Network (DreGcn) for the overall ABSA task in a joint framework, which is capable of fully exploiting the more fine-grained linguistic knowledge (e.g., the dependency relation and type) at the relational level than vanilla GCN.

  • We propose a more effective message-passing mechanism to ensure the model learns from multiple related tasks.

  • Our approach substantially outperforms previous systems and achieves consistently state-of-the-art results on three benchmark datasets.

Figure 2: Overview of the interactive architecture. “” denotes the iteration number and “” denotes the maximum number of iterations in the message-passing mechanism. Document-level parts are removed compared with the original work [He et al.2019].

2 Background

2.1 Task Definition

We formulate the complete aspect-based sentiment analysis (ABSA) task as two sequence labeling subtasks, namely, aspect term extraction (AE)111Aspect and opinion term co-extraction are simultaneously performed. In this paper, AE denotes these two tasks for simplicity. and aspect-level sentiment classification (AS). For the AE task, following [He et al.2019], we employ the BIO tagging scheme: to label all the aspect and opinion terms mentioned in the sentence222A word can not belong to both aspect term and opinion term at the same time.. BA and IA denotes the beginning and inside of an aspect term, respectively. BP and IP denotes the beginning and inside of an opinion term, respectively, and O denotes other words. For the AS task, we employ the label set: to mark the token-level sentiment polarity. and neu indicates the positive, negative and neutral sentiment polarity, respectively. Given an input sentence with length , our goal is to predict two tag sequences and , where , , respectively, .

2.2 An Interactive Architecture with Multi-task Learning

Figure 2 is the interactive architecture with multi-task learning, proposed by [He et al.2019], which is the current state-of-the-art model for the end-to-end ABSA task333Peng et al. peng2019knowing also achieve good performances on the end-to-end ABSA task but they focus on the limited scenario where the opinion term and corresponding aspect term need to be paired in one sentence., in which the Encoder Layers encode the sentence representation for multiple related tasks. The Task-specific Layers, which consist of two key components: message-passing and opinion-passing mechanism, serves as predictions for different tasks.

For an input sequence, the feature extractor maps the input to a shared latent sequence 444The iteration superscript in the description is omitted for simplicity, i.e., = .

. Then task-specific component AE assigns to each token with a probability distribution:

, where the top value of the probability distribution of each token indicates whether it is a part of any aspect terms or opinion terms. The output of the AS component is formulated as:

. Then, message-passing mechanism will update the sequence of shared latent vectors by combining the probability distribution of the AE and AS task:

(1)

where denotes the shared latent vector corresponding to after rounds of message-passing; is a re-encoding function (i.e. fully-connected layer) and [;] means concatenation.

Meanwhile, the opinion information from the AE task is sent to the AS task as shown in Figure 2, which is useful to the AS task. Specifically, a self-attention matrix is employed:

(2)

where means we only consider context words for inferring the sentiment of the target token; is the transformation matrix; is a distance-related factor and is computed by summing the predicted probabilities of which is the predicted probability on opinion-related labels (i.e. BP and IP). The Eq.(2) aims to measure the semantic relevance between and . Finally, and are concatenated as the output representation of the AS part where .

Although the interactive architecture mentioned above has achieved state-of-the-art performance, there still exist two drawbacks: 1) the architecture neglects the syntax modeling; and 2) the probability distribution is insufficient to pass the rich task-specific information. We thus propose a syntax augmented interactive architecture, which can fully exploit the syntax information by utilizing a dependency relation embedded graph convolutional network (DreGcn). And we also design a more effective message-passing mechanism. The whole model with those two key components will be elaborated in the next section.

Figure 3: Architecture of the proposed approach. and mean general-purpose and domain-specific embeddings, respectively. [;] denotes concatenation. AE: aspect term and opinion term co-extraction; AS: aspect-level sentiment classification.

3 Approach

3.1 Overview

In Figure 3, from left to right, our approach has two key components, described in detail with the callouts, to investigate two important intuitions in the ABSA task. Firstly, we carefully design a dependency relation embedded graph convolutional network (DreGcn) in the Encoder Layers, which aim to fully exploit the syntactic knowledge. Secondly, we propose a more effective message-passing mechanism in the Task-specific Layers to make the model learn from multiple related tasks.

3.2 Encoder Layers

To exploit the syntactic knowledge, we design a dependency relation embedded graph convolutional network (GCN, [Kipf and Welling2017]

) in the Encoder Layers. Additionally, we retain the convolutional neural network (CNN) as an alternative part, because the n-gram features at different granularities are important to the ABSA task.

GCN aggregates the feature vectors of neighboring nodes and propagates the information of a node to its first-order neighbors. For a dependency tree with nodes, an adjacency matrix can be generated. As done in [Schlichtkrull et al.2018], we add a self-loop for each node and include the reversed direction of a dependency arc if there is a dependency relation between node and node , i.e., = = 1, otherwise = = 0. Then a GCN layer can obtain new node features by convolving the neighboring nodes’ features by the following function:

(3)

where is the current node and denotes the neighborhood of node ; represents the feature of node at layer ; and are trainable weights, mapping the feature of a node to its adjacent nodes in the graph; , , where is the feature size. By stacking such GCN layers, GCN can retrieve regional features for each node.

In order to model the dependency relation type, we propose to use trainable latent features to represent each dependency relation type. Specifically, we preserve a trainable relational look-up table , where is the number of dependency relation types and is the dependency relation feature size. Then, the novel DreGcn can be defined as:

(4)

where [;] means concatenation, , denotes whether there is the k-th dependency relation type between node and node or not. In doing so, the relational feature among nodes can be reasonably modeled and updated during training.

3.3 Task-specific Layers

For the opinion-passing, we make the information of opinion term available to the AS task, as done in [He et al.2019]. For the message-passing, we design a more effective mechanism for information sharing between multiple related tasks. Instead of passing the predictions of the AE task and the AS task shown in Eq.(1), we propose to pass the original representation, which contains more abundant message than the probability distribution. The message-passing function is as follows:

(5)

where () denotes the task-specific representation corresponding to after

rounds of message-passing. The difference between the representation and the probability distribution is that the representation can be transformed to the probability by a fully-connected layer and a softmax layer. The new message-passing mechanism makes the rich information of the AE task and the AS task available to each other, and thus is more effective for the ABSA task, as empirically verified in the Ablation Study section.

3.4 Prediction

After times iteration, the predicted results for the AE task and the AS task are generated. Clearly, we can compute the score by directly counting the result for each task. To measure the overall performance, we need to obtain the aspect term-polarity pairs. Since the extracted aspect term may be composed of several tokens and the predicted polarities of each token may be inconsistent, we following [He et al.2019] only take the sentiment polarity of the first token of the current aspect term as the sentiment label.

3.5 Training

We simultaneously train the AE task and the AS task for message-passing. The loss function is as follows:

(6)

where denotes the total number of training instances, denotes the number of tokens contained in the th training instance, is the class number, and () denotes the ground-truth of the AE (AS) task. In all datasets, only aspect terms have sentiment annotations. We label each token which belongs to any aspect terms with the sentiment of the corresponding aspect terms. During training, we only consider AS predictions on these aspect term-related tokens for computing the AS loss and ignore the sentiments predicted on other tokens, i.e., in Eq.(6) if {BA, IA[He et al.2019].

4 Experiments

4.1 Datasets

Table 1 shows the statistics of all datasets. We use three benchmark datasets, taken from SemEval 2014 [Pontiki et al.2014] and SemEval 2015 [Pontiki et al.2015], to evaluate the effectiveness of our approach. The opinion terms are annotated by [Wang et al.2016]. We use , , and to denote SemEval-2014 Laptops, SemEval-2014 Restaurants, and SemEval-2015 Restaurants, respectively.

Datasets Train Test
Sentences AT OT Sentences AT OT
Laptop14 3,048 2,373 2,504 800 654 674
Restaurant14 3,044 3,699 3,484 800 1,134 1,008
Restaurant15 1,315 1,199 1,210 685 542 510
Table 1: Dataset statistics with numbers of sentences, aspect terms (AT) and opinion terms (OT).

4.2 Experiment Settings

Word Embeddings.

For general-purpose embeddings, we use GloVe.840B.300d released by [Pennington et al.2014]. For domain-specific embeddings, we adopt the embeddings released by [Xu et al.2018] as done in [He et al.2019].

Implementation Details.

Our models555Code: https://github.com/XL2248/DREGCN are trained by adam optimizer [Kingma and Ba2014], with the learning rate , and we set batch size to 50. At the training stage, as done in [He et al.2019], we randomly sample 20% of each training data as the development set and use the remaining 80% only for training. More details are given in Appendix A. The tuning details about the layer number of GCN and the iteration are given in Appendix B.

Evaluation Metrics.

We employ five metrics for evaluation and report the average score over 5 runs with random initialization in all experiments as done in [He et al.2019]. For the overall ABSA task, we compute the F1 score denoted as F1-I to measure the overall performance, where an extracted aspect term is taken as correct only when the span and the sentiment are both correctly identified. For the AE task, we use F1 to measure the performance of aspect term extraction and opinion term extraction, which are denoted as F1-a and F1-o, respectively. For the AS task, we adopt accuracy and macro-F1 to measure the performance of AS, which are denoted as acc-s and F1-s, respectively. The two metrics are computed based on the correctly extracted aspect terms from AE instead of the golden aspect term.

Methods
F1-a F1-o acc-s F1-s F1-I F1-a F1-o acc-s F1-s F1-I F1-a F1-o acc-s F1-s F1-I
CMLA-ALSTM 76.80 77.33 70.25 66.67 53.68 82.45 82.67 77.46 68.70 63.87 68.55 71.07 81.03 58.91 54.79
CMLA-dTrans 76.80 77.33 72.38 69.52 55.56 82.45 82.67 79.58 72.23 65.34 68.55 71.07 82.27 66.45 56.09
DECNN-ALSTM 78.38 78.81 70.46 66.78 55.05 83.94 85.60 77.79 68.50 65.26 68.32 71.22 80.32 57.25 55.10
DECNN-dTrans 78.38 78.81 73.10 70.63 56.60 83.94 85.60 80.04 73.31 67.25 68.32 71.22 82.65 69.58 56.28
PIPELINE-IMN 78.38 78.81 72.29 68.12 56.02 83.94 85.60 79.56 69.59 66.53 68.32 71.22 82.27 59.53 55.96
MNN 76.94 77.77 70.40 65.98 53.80 83.05 84.55 77.17 68.45 63.87 70.24 69.38 80.79 57.90 56.57
INABSA 77.34 76.62 72.30 68.24 55.88 83.92 84.97 79.68 68.38 66.60 69.40 71.43 82.56 58.81 57.38
IMN wo DE 76.96 76.85 72.89 67.26 56.25 83.95 85.21 79.65 69.32 66.96 69.23 68.39 81.64 57.51 56.80
IMN 78.46 78.14 73.21 69.92 57.66 84.01 85.64 81.56 71.90 68.32 69.80 72.11 83.38 60.65 57.91
IMN 77.96 77.51 75.36 72.02 58.37 83.33 85.61 83.89 75.66 69.54 70.04 71.94 85.64 71.76 59.18
IMN+BERT 78.47 79.05 77.18 74.56 60.53 85.22 86.64 84.90 76.54 71.33 72.55 72.43 84.37 71.28 60.76
DreGcn wo DE (Ours) 76.30 73.92 75.83 71.05 57.48 83.75 84.09 80.78 71.23 67.51 68.63 70.09 84.25 71.29 57.70
DreGcn (Ours) 77.78 76.62 77.18 72.27 59.66 84.16 85.04 81.27 72.48 68.94 69.36 70.75 86.03 66.89 59.71
DreGcn+CNN (Ours) 79.45 75.40 77.86 73.46 61.60 85.93 86.05 81.88 73.32 70.21 71.00 70.55 86.16 73.35 61.06
DreGcn+CNN+BERT(Ours) 79.78 79.21 79.37 76.37 63.04 87.00 86.95 83.61 75.79 72.60 73.30 72.60 85.25 73.02 62.37
Table 2: Model comparison. The results with “” are retrieved from IMN [He et al.2019]. “” represents that these models utilize a large document-level corpus. “” denotes without using document-level corpus. “wo DE” indicates without using domain-specific embeddings. “” denotes exploiting BERT-BASE features on “DreGcn+CNN”. In the Encoder Layers of Figure 3, “IMN” means only CNN module, and “DreGcn” means only DREGCN module. The results of ours do not use any document-level corpus.

4.3 Compared Models

  • Pipeline Approach.

    {CMLA, DECNN}-{ALSTM, dTrans}: The four methods are constructed by two best-performing models for two subtasks. For AE task, we select CMLA [Wang et al.2017] and DECNN [Xu et al.2018]. The former is proposed for the AE task through modeling their inter-dependencies. The latter utilizes a multi-layer CNN structure as encoder with double embeddings. For AS task, ATAE-LSTM (denoted as ALSTM for short) [Wang et al.2016] and the model from [He et al.2018] (denoted as dTrans) are used. ALSTM is an attention-based LSTM structure. The dTrans introduces a large document-level corpus to improve the AS performance.

    PIPELINE-IMN: It means the pipeline setting of IMN [He et al.2019], which trains the AE task and the AS task separately.

    SPAN-pipeline [hu et al.2019]: This work investigates those three methods (i.e. pipeline, integrated and joint) with BERT as backbone networks, which obtains the best results with SPAN-pipeline method. We replace BERT-Large with BERT-Base in their released code to get the result.

  • Integrated Approach.

    MNN [Wang et al.2018]: It handles this task as a sequence labeling task with a unified tagging scheme.

    INABSA [Li et al.2019]: This model leverages a unified tagging scheme to integrate the two subtasks of ABSA.

    BERT+GRU [Li et al.2019]: It explores the potential of BERT for ABSA task.

  • Joint Approach.

    DOER [Luo et al.2019]: This model employs a cross-shared unit to jointly train the two subtasks.

    IMN [He et al.2019]: It is the current state-of-the-art method, which uses an interactive architecture with multi-task learning for end-to-end ABSA task. “IMN wo DE” and “IMN” are the variants of IMN.

Row Model 0 DOER [Luo et al.2019] 59.48 1 DreGcn+CNN (Ours) 61.60 2 BERT+GRU ([Li et al.2019] 60.42 3 SPAN-pipeline ([hu et al.2019] 61.84 4 DreGcn+CNN+ (Ours) 63.04
Table 3: F1-I (%) scores on , which is our common dataset. “” indicates that the results are generated by running their released code under our experimental setting (dataset).
Row Model 0 CNN 56.66 66.32 57.91 1 Vanilla GCN (Eq.(3)) 57.10 65.00 56.86 2 DreGcn (Eq.(4)) 57.46 66.25 58.32 3 +Opinion-passing (Eq.(2)) 57.89 66.51 58.57 4 +Message-passing predictions (Eq.(1)) 58.50 67.36 57.92 5 +Message-passing representations (Eq.(5)) 61.60 70.21 61.06
Table 4: F1-I (%) scores of ablation study. The component (i.e., Rows 35) is added on the DreGcn (i.e., Row 2), respectively.

4.4 Results and Analysis

Overall Performance.

Table 2 and Table 4 present the results of our models and baseline models for the complete ABSA task. Results show that our model consistently outperforms all baseline models by a large margin on all datasets in most cases even without BERT. Since there is no syntax-based method for the overall ABSA task to compare with, we also conduct experiments on the separate subtask setting, i.e, the AE and AS task, which are presented in Appendix D. From Table 2 and Table 4, we can conclude:

1) For the overall performance (F1-I), Table 2 shows that “DreGcn+CNN” is able to significantly surpass other baselines. Concretely, “DreGcn+CNN” outperforms the best F1-I results of IMN by 3.23%, 0.67%, and 1.88% on , , and , respectively666Note that our approach does not use any document-level corpus, while IMN exploits this additional corpus., suggesting that DreGcn and message-passing mechanism have an overall positive impact on the ABSA task. We notice that the improvement of our method on is marginal by contrast with IMN. The reason may be that contains a large number of ungrammatical sentences (14.3%), which affect the accuracy of dependency parsing. After using features, we achieve further improvements (+4.67%, +3.06%, and +3.19% compared with IMN, respectively). Besides, the results also show that domain-specific knowledge is very helpful (“IMN wo DE” vs. IMN and “DreGcn wo DE” vs. DreGcn).

2) For AE (F1-a and F1-o in Table 2), “DreGcn+CNN” performs the best in most cases than baselines. Those results demonstrate the effectiveness of our model, which indeed benefits from the dependency structure information and message-passing mechanism. This shows that the syntax information is very pivotal to the AE task.

3) For AS (acc-s and F1-s in Table 2), even though some methods (IMN and the pipeline methods with dTrans) utilize additional knowledge by joint training with document-level tasks, DreGcn still significantly surpasses the baseline methods. This suggests that our model can sufficiently model the dependency structure and indeed benefit from the message-passing mechanism. This shows that the syntax information is crucial for the AS task.

4) Table 4 shows the results of our model and another strong baselines: DOER, “BERT+GRU” and SPAN-pipeline. We find that “DreGcn+CNN” can surpass DOER and even be highly comparable with BERT-based models. Our model with BERT (Row 4) can also outperform the “BERT+GRU” (Row 2) and SPAN-pipeline (Row 3), which suggests the effectiveness of our proposed approach. Besides, we investigate the impact of BERT CLS at different positions in the model, which are given in Appendix C.

Ablation Study.

To investigate the impact of different components, we conduct ablation studies in Table 4, where Rows 12 are conducted without any informative message-passing, and add other components on DreGcn one at a time (Rows 35). From Table 4, we can conclude:

  1. Considering dependency relation types as features between nodes is helpful with considerable performance gains to the ABSA task (Row 2 vs. Row 1 & Row 0), which shows that the syntax information is very critical for both aspect term extraction and sentiment recognition.

  2. Opinion message can indeed help the AS task and thus improves the overall performance (Row 3 vs. Row 2).

  3. Message-passing makes a large contribution to the overall performance (Row 4 & Row 5 vs. Row 2).

  4. Transferring representations (our proposed message-passing mechanism) is more helpful than passing predictions (Row 5 vs. Row 4), which is intuitive that original representations have richer information than the probability distribution.

Case Study.

To provide an intuitive understanding of how the DreGcn works, we present some examples in Table 5. As observed in Example 1 and 2, the “Vanilla GCN” correctly predicts the opinion term and the sentiment while it fails to produce the right aspect term. With the help of modeling the dependency relation type: and (i.e. by a coordinating conjunction word ), DreGcn can correctly handle these two cases, which suggests that dependency relation type is indeed critical to the AE task. For the sentiment orientation of multiple aspect terms, our model is not confused when identifying the sentiment polarity in Example 3. Here, DreGcn can accurately predict the sentiment polarity because of modeling the dependency relation. For Example 4, since no opinion word is mentioned in this sentence, “device” should not be regarded as an aspect term. DreGcn avoids to extract this kind of terms by aggregating information from rich opinion and sentiment representation, which demonstrates the effectiveness of our message-passing mechanism. For Example 5, due to combining the dependency relation type: with the message-passing mechanism, DreGcn correctly handles this case even though “veal” is an uncommon word in the training corpus.

 

Examples (Golden labels are marked.) Vanilla GCN IMN DreGcn
Opinion Complete Opinion Complete Opinion Complete
1. Biggest complaint is [windows 8] complaint [windows]() complaint [windows 8] complaint [windows 8]
2. It is the perfect [size] and [speed] for me. perfect [size], None () perfect [size],[speed]() perfect [size],[speed]
3. [Coffee] is a better deal than overpriced [cosi sandwiches] better, None () [Coffee], [sandwiches] better, overpriced [Coffee], [cosi sandwiches] better, overpriced [Coffee], [cosi sandwiches]
4.The device speaks about itself. None [device]() None [device]() None None
5. The [veal] and the [mushrooms] were cooked perfectly. perfectly None(), [mushrooms] perfectly [veal](), [mushrooms] perfectly [veal], [mushrooms]

 

Table 5: Case study. The “Opinion” and “Complete” columns denote the opinion terms and aspect terms with corresponding sentiment polarities, respectively. “” indicates incorrect predictions.

5 Related Work

Aspect-based Sentiment Analysis.

There are two sub-tasks in ABSA, namely, the aspect term extraction task [Qiu et al.2011, Ye et al.2017, He et al.2017, Wang et al.2016, Wang et al.2017, Yin et al.2019, Yin et al.2019, Li and Lam2017, Li et al.2018, Angelidis, Stefanos and Lapata, Mirella2018, fan et al.2019, Ma et al.2019] and the aspect-level sentiment classification task [Vo and Zhang2015, Xu et al.2018, Tang et al.2019, Tang et al.2016, Wang et al.2016, Wang et al.2019, Wang et al.2019, Wang et al.2019, Wang et al.2019, Liu and Zhang2017, Chen et al.2017, Chen and Qian2019, Ma et al.2018, Li et al.2018, Li et al.2019, hu et al.2019, Li and Lu2019, Du et al.2019, Bao, Lingxian and Patrik Lambert, and Badia, Toni2019, Yang et al.2019, Sun et al.2019, Liang et al.2019, Jiang et al.2019], which have been deeply studied as two separate tasks in the past. Recently, some methods attempt to solve the overall ABSA task simultaneously. Concretely, a unified tagging scheme is applied to address it as a sequence labeling task, while the inter-dependency relation between the two tasks is not explicitly modeled [Zhang et al.2015, Wang et al.2018, Li et al.2019]. Therefore, some studies propose to take them as two sequence tagging tasks and jointly model them, which generate some promising results in this direction [He et al.2019, Luo et al.2019]. However, the syntax information is not considered in their models, which is important to the ABSA task. Although some work involves the syntax information in separate subtask settings, they do not sufficiently exploit that information to enhance the overall ABSA task. For example, Dong et al. dong-etal-2014-adaptive and Nguyen and Shirai nguyen-shirai-2015-phrasernn need to convert the dependency structure into a binary tree and then adjust the target aspect as the root node, which may lead to the opinion word far away from the target aspect, while GCN can overcome the limitation over the original order of the dependency graph [huang and carley2019, Peng et al.2019, Zhang et al.2019].

Graph Convolutional Network.

GCN [Kipf and Welling2017]

has been extensively studied in many natural language processing (NLP) tasks. The ABSA task is no exception, for instance, most existing GCN-based methods are for separate AS task setting, in which Zhao et al. DBLP:journals/corr/abs-1906-04501 focus on modeling the sentiment dependencies over multiple aspect terms in one sentence; Sun et al. sun-etal-2019-aspect, Huang and Carley huang-carley-2019-syntax, Hou et al. hou2019selective and Zhang et al. zhang-etal-2019-aspect focus on encoding more aspect-specific representations by using vanilla GCN (GAN) on dependency graphs without considering dependency types.

Different from all studies above, in this work, we focus on the end-to-end ABSA task and extend vanilla GCN through embedding dependency types into the model for capturing more fine-grained linguistic knowledge (i.e. the dependency relation and type) at the relational level in a joint framework, and obtain better performances.

6 Conclusions

In this paper, we propose a dependency syntactic knowledge augmented interactive architecture for end-to-end ABSA task, which can fully exploit the syntax information through a well-designed dependency relation embedded graph convolutional network (DreGcn) and jointly model multiple related tasks. In addition, we design a more effective message-passing mechanism to enable our model to learn information representation from multiple tasks. The experimental results on three benchmark datasets demonstrate the effectiveness of our proposed approach, which achieves new state-of-the-art results. Besides, using BERT as an additional feature extractor, we provide further improvements.

Acknowledgements

Liang, Xu and Chen are supported by the National Natural Science Foundation of China (Contract 61370130, 61976015, 61473294 and 61876198), and the Beijing Municipal Natural Science Foundation (Contract 4172047), and the International Science and Technology Cooperation Program of the Ministry of Science and Technology (K11F100010).

References

  • [Angelidis, Stefanos and Lapata, Mirella2018] Stefanos Angelidis and Mirella Lapata. 2018. Summarizing Opinions: Aspect Extraction Meets Sentiment Prediction and They Are Both Weakly Supervised. In EMNLP.
  • [Bao, Lingxian and Patrik Lambert, and Badia, Toni2019] Lingxian Bao, and Patrik Lambert and Toni Badia. 2019.

    Attention and Lexicon Regularized LSTM for Aspect-based Sentiment Analysis.

    In ACL.
  • [Dai and Song2019] Lingxian Bao and Patrik Lambert and Toni Badia. 2019. Neural Aspect and Opinion Term Extraction with Mined Rules as Weak Supervision. In ACL, 5268–5277, Florence, Italy.
  • [Devlin et al.2018] Jacob Devlin and Ming-Wei Chang and Kenton Lee and Kristina Toutanova. 2018. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. CoRR.
  • [Dong et al.2014] Li Dong and Furu Wei and Chuanqi Tan and Duyu Tang and Ming Zhou and Ke Xu. 2014. Adaptive Recursive Neural Network for Target-dependent Twitter Sentiment Classification. In ACL.
  • [Du et al.2019] Chunning Du and Haifeng Sun and Jingyu Wang and Qi Qi and Jianxin Liao and Tong Xu and Ming Liu. 2019. Capsule Network with Interactive Attention for Aspect-Level Sentiment Classification. In EMNLP.
  • [fan et al.2019] Zhifang Fan and Zhen Wu and Xin-Yu Dai and Shujian Huang and Jiajun Chen. 2019. Target-oriented Opinion Words Extraction with Target-fused Neural Sequence Labeling. In NAACL.
  • [He et al.2018] Ruidan He and Wee Sun Lee and Hwee Tou Ng and Daniel Dahlmeier. 2018. Exploiting Document Knowledge for Aspect-level Sentiment Classification. In ACL.
  • [He et al.2017] Ruidan He and Wee Sun Lee and Hwee Tou Ng and Daniel Dahlmeier. 2017.

    An Unsupervised Neural Attention Model for Aspect Extraction.

    In ACL.
  • [He et al.2018] Ruidan He and Wee Sun Lee and Hwee Tou Ng and Daniel Dahlmeier. 2018. Effective Attention Modeling for Aspect-Level Sentiment Classification. In ACL.
  • [He et al.2019] Ruidan He and Wee Sun Lee and Hwee Tou Ng and Daniel Dahlmeier. 2019. An Interactive Multi-Task Learning Network for End-to-End Aspect-Based Sentiment Analysis. In ACL.
  • [Hochreiter and Schmidhuber1997] Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural Computation, 9(8):1735–1780.
  • [Honnibal and Johnson2015] Matthew Honnibal and Mark Johnson. 2015. An Improved Non-monotonic Transition System for Dependency Parsing. In EMNLP.
  • [Hou et al.2019] Xiaochen Hou and Jing Huang and Guangtao Wang and Kevin Huang and Xiaodong He and Bowen Zhou. 2019. Selective Attention Based Graph Convolutional Networks for Aspect-Level Sentiment Classification. CoRR.
  • [hu et al.2019] Mengting Hu and Shiwan Zhao and Li Zhang and Keke Cai and Zhong Su and Renhong Cheng and Xiaowei Shen. 2019. CAN: Constrained Attention Networks for Multi-Aspect Sentiment Analysis. In EMNLP.
  • [hu et al.2019] Minghao Hu and Yuxing Peng and Zhen Huang and Dongsheng and Yiwei Lv. 2019. Open-Domain Targeted Sentiment Analysis via Span-Based Extraction and Classification. In ACL.
  • [huang and carley2019] Binxuan Huang and Kathleen Carley. 2019. Syntax-Aware Aspect Level Sentiment Classification with Graph Attention Networks. In EMNLP.
  • [Kingma and Ba2014] Diederik P. Kingma and Jimmy Ba. 2014. Adam: A Method for Stochastic Optimization. CoRR.
  • [Kipf and Welling2017] Thomas N. and Welling, Max. 2017. Semi-Supervised Classification with Graph Convolutional Networks. In ICLR.
  • [Li and Lu2019] Hao Li and Wei Lu. 2019. Learning Explicit and Implicit Structures for Targeted Sentiment Analysis. In EMNLP.
  • [Li and Lam2017] Xin Li and Wai Lam, . 2017. Deep Multi-Task Learning for Aspect Term Extraction with Memory Interaction. In EMNLP.
  • [Li et al.2019] Xin Li and Lidong Bing and Piji Li and Wai Lam. 2019. A unified model for opinion target extraction and target sentiment prediction. In AAAI.
  • [Li et al.2018] Xin Li and Lidong Bing and Piji Li and Wai Lam and Zhimou Yang. 2018. Aspect Term Extraction with History Attention and Selective Transformation. In IJCAI.
  • [Li et al.2019] Xin Li and Lidong Bing and Wenxuan Zhang and Wai Lam. 2019. Exploiting BERT for End-to-End Aspect-based Sentiment Analysis. In EMNLP.
  • [Li et al.2018] Xin Li and Lidong Bing and Wai Lam and Bei Shi. 2018. Transformation Networks for Target-Oriented Sentiment Classification. In ACL.
  • [Li et al.2019] Zheng Li and Xin Li and Ying Wei and Lidong Bing and Yu Zhang and Qiang Yang. 2019. Transferable End-to-End Aspect-based Sentiment Analysis with Selective Adversarial Learning. In EMNLP.
  • [Liu and Zhang2017] Jiangming Liu and Yue Zhang. 2017. Attention Modeling for Targeted Sentiment. In ACL.
  • [Luo et al.2019] Huaishao Luo and Tianrui Li and Bing Liu and Bin Wang and Herwig Unger. 2019. Improving Aspect Term Extraction with Bidirectional Dependency Tree Representation. In IEEE/ACM Transactions on Audio, Speech, and Language Processing.
  • [Luo et al.2019] Huaishao Luo and Tianrui Li and Bing Liu and Junbo Zhang. 2019. DOER: Dual Cross-Shared RNN for Aspect Term-Polarity Co-Extraction. In ACL.
  • [Ma et al.2019] Dehong Ma and Sujian Li and Fangzhao Wu and Xing Xie and Houfeng Wang. 2019. Exploring Sequence-to-Sequence Learning in Aspect Term Extraction. In ACL.
  • [Ma et al.2018] Yukun Ma and Haiyun Peng and Eri Cambria. 2018. Targeted aspect-based sentiment analysis via embedding commonsense knowledge into an attentive LSTM. In AAAI.
  • [nguyen and Shirai2019] Thien Hai Nguyen and Kiyoaki Shirai. 2019. PhraseRNN: Phrase Recursive Neural Network for Aspect-based Sentiment Analysis. In EMNLP.
  • [Peng et al.2019] Haiyun Peng and Lu Xu and Lidong Bing and Fei Huang and Wei Lu and Luo Si. 2019. Knowing What, How and Why: A Near Complete Solution for Aspect-based Sentiment Analysis. In AAAI.
  • [Pennington et al.2014] Jeffrey Pennington and Richard Socher and Christopher Manning. 2014. Glove: Global Vectors for Word Representation. In EMNLP.
  • [Pontiki et al.2015] Maria Pontiki and Dimitris Galanis and Haris Papageorgiou and Suresh Manandhar and Ion Androutsopoulos. 2015. SemEval-2015 Task 12: Aspect Based Sentiment Analysis. In SemEval 2015.
  • [Pontiki et al.2014] Maria Pontiki and Dimitris Galanis and John Pavlopoulos and Harris Papageorgiou and Ion Androutsopoulos and Suresh Manandhar. 2014. SemEval-2014 Task 4: Aspect Based Sentiment Analysis. In SemEval 2014.
  • [Qiu et al.2011] Guang Qiu and Bing Liu and Jiajun Bu and Chun Chen. 2011. Opinion word expansion and target extraction through double propagation. In Computational linguistics.
  • [Schlichtkrull et al.2018] Michael Schlichtkrull and Thomas N Kipf and Peter Bloem and Rianne Van Den Berg and Ivan Titov and Max Welling. 2018. Modeling relational data with graph convolutional networks. In European Semantic Web Conference.
  • [Schlichtkrull et al.2011] Socher, Richard and Lin, Cliff C and Manning, Chris and Ng, Andrew Y. 2011. Parsing natural scenes and natural language with recursive neural networks. In ICML.
  • [Sun et al.2019] Chi Sun and Luyao Huang and Xipeng Qiu. 2019. Utilizing BERT for Aspect-Based Sentiment Analysis via Constructing Auxiliary Sentence. In NAACL.
  • [Tang et al.2016] Duyu Tang and Bing Qin and Ting Liu. 2016. Aspect Level Sentiment Classification with Deep Memory Network. In EMNLP.
  • [Tang et al.2019] Jialong Tang and Ziyao Lu and Jinsong Su and Yubin Ge and Linfeng Song and Le Sun and Jiebo Luo. 2019. Progressive Self-Supervised Attention Learning for Aspect-Level Sentiment Analysis. In ACL.
  • [Tang et al.2016] Meishan Zhang and Yue Zhang and Duy-Tin Vo. 2016. Gated Neural Networks for Targeted Sentiment Analysis. In AAAI.
  • [Wang et al.2018] F. Wang and M. Lan and W. Wang. 2018. Towards a One-stop Solution to Both Aspect Extraction and Sentiment Analysis Tasks with Neural Multi-task Learning. In IJCNN.
  • [Wang et al.2018] X. Wang and G. Xu and J. Zhang and X. Sun and L. Wang and T. Huang. 2018. Syntax-Directed Hybrid Attention Network for Aspect-Level Sentiment Analysis. In IEEE Access.
  • [Wang et al.2019] Hao Wang and Bing Liu and Chaozhuo Li and Yan Yang and Tianrui Li. 2019. Learning with Noisy Labels for Sentence-level Sentiment Classification. In EMNLP.
  • [Wang et al.2019] Jin Wang and Yu, Liang-Chih and Lai, K. Robert and Zhang, Xuejie. 2019. Investigating Dynamic Routing in Tree-Structured LSTM for Sentiment Analysis. In EMNLP.
  • [Wang et al.2019] Jingjing Wang and Changlong Sun and Shoushan Li and Xiaozhong Liu and Luo Si and Min Zhang and Guodong Zhou, . 2019. Aspect Sentiment Classification Towards Question-Answering with Reinforced Bidirectional Attention Network. In ACL.
  • [Wang et al.2017] Wenya Wang and Sinno Jialin Pan and Daniel Dahlmeier and Xiaokui Xiao. 2017. Coupled multi-layer attentions for co-extraction of aspect and opinion terms. In AAAI.
  • [Wang et al.2016] Wenya Wang and Sinno Jialin Pan and Daniel Dahlmeier and Xiaokui Xiao”. 2016. Recursive Neural Conditional Random Fields for Aspect-based Sentiment Analysis. In EMNLP.
  • [Wang et al.2016] Yequan Wang and Minlie Huang and xiaoyan zhu and Li Zhao”. 2016. Attention-based LSTM for Aspect-level Sentiment Classification. In EMNLP.
  • [Wang et al.2019] Yequan Wang and Aixin Sun and Minlie Huang and Xiaoyan Zhu”. 2019. Aspect-level Sentiment Analysis Using AS-Capsules. In WWW.
  • [Xu et al.2018] Hu Xu and Bing Liu and Lei Shu and Philip S Yu. 2018. Double Embeddings and CNN-based Sequence Labeling for Aspect Extraction. In ACL.
  • [Chen et al.2017] Peng Chen and Zhongqian Sun and Lidong Bing and Wei Yang”. 2017. Recurrent Attention Network on Memory for Aspect Sentiment Analysis. In EMNLP.
  • [Xu et al.2018] Duyu Tang and Bing Qin and Xiaocheng Feng and Ting Liu. 2018. Effective LSTMs for Target-Dependent Sentiment Classification. In COLING.
  • [Vo and Zhang2015] Duy-Tin Vo and Yue Zhang. 2015. Double Embeddings and CNN-based Sequence Labeling for Aspect Extraction. In IJCAI.
  • [Yang et al.2019] Chao Yang and Hefeng Zhang and Bin Jiang and Keqin Li. 2019. Aspect-based sentiment analysis with alternating coattention networks. Information Processing and Management.
  • [Ye et al.2017] Hai Ye and Zichao Yan and Zhunchen Luo and Wenhan Chao”. 2017. Aspect-level Sentiment Analysis Using AS-Capsules. In PAKDD.
  • [Yin et al.2019] Yichun Yin and Chenguang Wang and Ming Zhang. 2019. PoD: Positional Dependency-Based Word Embedding for Aspect Term Extraction. CoRR.
  • [Yin et al.2019] Yichun Yin and Furu Wei and Li Dong and Kaimeng Xu and Ming Zhang and Ming Zhou. 2019. Unsupervised Word and Dependency Path Embeddings for Aspect Term Extraction. CoRR.
  • [Zhang et al.2019] Chen Zhang and Qiuchi Li and Dawei Song”. 2019. Aspect-based Sentiment Classification with Aspect-specific Graph Convolutional Networks. In EMNLP.
  • [Zhang et al.2015] Meishan Zhang and Yue Zhang and Duy-Tin Vo”. 2015. Neural Networks for Open Domain Targeted Sentiment. In EMNLP.
  • [Zhao et al.2019] Pinlong Zhao and Linlin Hou and Ou Wu. 2019. Modeling Sentiment Dependencies with Graph Convolutional Networks for Aspect-level Sentiment Classification. CoRR.
  • [Chen and Qian2019] Zhuang Chen and Tieyun Qian. 2019. Transfer Capsule Network for Aspect Level Sentiment Classification. In ACL.
  • [Liang et al.2019] Yunlong Liang and Fandong Meng and Jinchao Zhang and Jinan Xu and Yufeng Chen and Jie Zhou 2019. A Novel Aspect-Guided Deep Transition Model for Aspect Based Sentiment Analysis. In EMNLP.
  • [Sun et al.2019] Kai Sun and Richong Zhang and Mensah, Samuel and Yongyi Mao and Xudong Liu. 2019. Aspect-Level Sentiment Analysis Via Convolution over Dependency Tree. In EMNLP.
  • [Jiang et al.2019] Qingnan Jiang and Lei Chen and Ruifeng Xu and Xiang Ao and Min Yang 2019. A Challenge Dataset and Effective Models for Aspect-Based Sentiment Analysis. In EMNLP.