Aspect Sentiment Triplet Extraction Using Reinforcement Learning

08/13/2021 ∙ by Samson Yu Bai Jian, et al. ∙ Singapore University of Technology and Design 0

Aspect Sentiment Triplet Extraction (ASTE) is the task of extracting triplets of aspect terms, their associated sentiments, and the opinion terms that provide evidence for the expressed sentiments. Previous approaches to ASTE usually simultaneously extract all three components or first identify the aspect and opinion terms, then pair them up to predict their sentiment polarities. In this work, we present a novel paradigm, ASTE-RL, by regarding the aspect and opinion terms as arguments of the expressed sentiment in a hierarchical reinforcement learning (RL) framework. We first focus on sentiments expressed in a sentence, then identify the target aspect and opinion terms for that sentiment. This takes into account the mutual interactions among the triplet's components while improving exploration and sample efficiency. Furthermore, this hierarchical RLsetup enables us to deal with multiple and overlapping triplets. In our experiments, we evaluate our model on existing datasets from laptop and restaurant domains and show that it achieves state-of-the-art performance. The implementation of this work is publicly available at



There are no comments yet.


page 1

page 2

page 3

page 4

Code Repositories


This repository contains the source codes for the paper: "Aspect Sentiment Triplet Extraction using Reinforcement Learning" published at CIKM 2021.

view repo
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1. Introduction

Aspect-based sentiment analysis (ABSA) or target-based sentiment analysis (TBSA) is an important research area in natural language processing (NLP) 

(Poria et al., 2020; Duan et al., 2021). It consists of various fine-grained sentiment analysis tasks (Nasukawa and Yi, 2003; Liu et al., 2010; Liu, 2012), with the three most fundamental being aspect/target term extraction, opinion term extraction and aspect/target term sentiment classification. Aspect Sentiment Triplet Extraction (ASTE) is a relatively new subtask of ABSA introduced by Li et al. (2019a); Peng et al. (2020). In ASTE, the task is to extract triplets containing aspect terms, their associated sentiment polarities, and the opinion terms that express those sentiments. A sentence may contain multiple such triplets where the aspect terms or opinion terms across triplets may overlap with each other. We include an example of such triplets in Table 1.

Appetizers are excellent ; you can make a great
( but slightly expensive ) meal out of them .
[Aspect ; Opinion ; Sentiment]
(1) Appetizer ; excellent ; positive
(2) meal ; great ; positive
(3) meal ; slightly expensive ; negative
Table 1. An Example of ASTE Triplets Present in a Sentence

Existing methods, such as, CMLA+ (Wang et al., 2017), RINANTE+ (Dai and Song, 2019), Li-unified-R (Li et al., 2019a), WhatHowWhy (Peng et al., 2020), OTE-MTL (Zhang et al., 2020), GTS (Wu et al., 2020), JET (Xu et al., 2020), TOP (Huang et al., 2021) and BMRC (Chen et al., 2021) are mainly divided into simultaneous and sequential methods. Early works (Wang et al., 2017; Dai and Song, 2019; Li et al., 2019a; Peng et al., 2020; Zhang et al., 2020; Wu et al., 2020) usually employ a two-staged approach where they simultaneously extract aspect terms with sentiments and opinion terms. These triplets are subsequently decoded through triplet classification or pairwise matching. Recent works (Huang et al., 2021; Chen et al., 2021) have shifted towards a more multi-stage, restrictive and sequential process during the extraction stage that can potentially capture more mutual dependencies and correlations among the triplet’s components while forgoing the triplet decoding stage.

In this work, we tackle the ASTE task using a novel paradigm ASTE-RL where we consider the aspect and opinion terms as arguments of the sentiments expressed in a sentence. Previous approaches usually simultaneously extract all three components or first identify the aspect and opinion terms, then pair them up to predict their sentiment polarities. Unlike previous approaches, we propose a hierarchical reinforcement learning (RL) framework (Takanobu et al., 2019; Duan et al., 2020) where we first consider the sentiment polarities, then identify their associated opinion and aspect terms using separate RL processes. This process is repeated to extract all triplets present in a sentence. With this hierarchical RL setup, the model handle multiple triplets and overlapping triplets, and model interactions between the three components effectively. Inspired by the recent success of the multi-turn machine reading comprehension (MRC) framework (Li et al., 2019b; Chen et al., 2021), we incorporate ideas to further improve mutual interactions.

2. Proposed Framework

2.1. Overview

We divide our framework ASTE-RL into three components: 1) aspect-oriented sentiment classification, 2) opinion term extraction and 3) aspect term extraction. For the sentiment classification component, the sentiment is expressed towards the aspect term, and has four possible labels: . Our opinion and aspect extraction components are sequence labeling models with a BIO tagging scheme (Ramshaw and Marcus, 1999). With this BIO scheme, we have three different labels to tag an input sequence for the opinion/aspect terms: . For a given sentence with tokens, ASTE-RL aims to output a set of labels where is the number of labels, represents the tagging labels for the opinion term in a predicted triplet, represents the tagging labels for the aspect term, and represents the sentiment polarity.

The three components are structured in a two-level hierarchy (Takanobu et al., 2019). In the higher level, we have the sentiment indicator. During the sequential scan of a sentence, an agent will decide at each position in a sentence at the token if it has gathered sufficient information to mark the position as indicative of a sentiment that is expressed towards an aspect term. If not, the agent will mark it as . Otherwise, it will mark it as either or

. In the latter case, the agent launches two subtasks in the lower level for the opinion and aspect extractions to identify the terms as arguments of the sentiment and engages in sequence labeling. Upon completion, the agent will return to the high-level sentiment indication process and continue the sequential scan of the sentence. This process is well-suited to be formulated as a semi-Markov decision process

(Sutton et al., 1999b): 1) a high-level RL process that detects a sentiment indicator in a sentence; 2) two low-level RL processes that identify the opinion and aspect terms separately for the corresponding sentiment.

2.2. Aspect-Oriented Sentiment Classification with High-Level RL

The high-level RL policy aims to detect the aspect-oriented sentiments in a sentence. This can be seen as a RL policy over options, where options are high-level actions (Sutton et al., 1999b).

Option: The option is selected from where indicates no sentiment indicated towards any aspect term.

State: The state at each time step is represented by: 1) the current hidden state , 2) the current part-of-speech (POS) tag

, 3) the sentiment polarity vector

, and 4) the high-level state for the previous time step . To obtain for each token in a sentence, we pass the sentence into the spaCy ( POS tagger. The sentiment polarity vector is the embedding of the latest option where . Both the POS tag and sentiment embeddings are learned parameters in the model. Hence, the state is formally represented by:


where is a non-linear function implemented by a MLP. The hidden state is obtained from a pre-trained BERT model (Devlin et al., 2018) with Whole Word Masking and fine-tuned on the SQuAD v1.1 training set (Rajpurkar et al., 2016). Specifically, we first combine the query ”Which tokens indicate sentiments relating pairs of aspect spans and opinion spans?” and the review sentence into the BERT tokenizer to get a final input . We then pass this input into the BERT model, and represents the output vector from the BERT model that corresponds to the token . The initial state is initialized as: .

Policy: The stochastic policy for sentiment detection

specifies a probability distribution over the options:


Reward: At every time step, when is executed, the intermediate reward provided by the environment follows this:


If a sentiment that is expressed towards an aspect term is detected at a time step (i.e. ), the agent will launch two subtasks as low-level RL processes. When the subtasks are completed, the agent will return to the high-level RL process. Otherwise, the agent continues its sequential scan of until the last option about the last word of is sampled. When all options are sampled (i.e. at the end of the combined hierarchical RL process), there is a final reward for the high-level process: where

is the harmonic mean of the precision and recall in terms of the sentiment(s) in a sentence


2.3. Opinion and Aspect Extractions with Low-Level RL

Every time the high-level policy detects an aspect-oriented sentiment, two low-level policies will extract the corresponding opinion and aspect terms respectively and separately for the sentiment. In this subsection, we will generalize the RL elements such that they apply for both low-level RL processes, unless otherwise stated.

Action: The action at every time step is to assign a tag to the current word. The action is selected from , following a BIO tagging scheme. The symbols represent the beginning and inside of an opinion/aspect term respectively, while the symbol represents the unmarked label.

State: Similar to the high-level policy, the state at each time step is represented by: 1) the current hidden state , 2) the current POS tag , 3) the opinion/aspect tag vector , 4) the low-level state for the previous time step . To enhance the interactions between the sentiment and its associated opinion/aspect terms, we add a context vector to the state at each time step , using the sentiment state representation assigned to the latest option :


We also add the output vector from the BERT model for the [CLS] token, while computing the output vectors for the hidden states. Hence, the state is formally represented by:


Note that the representations used to compute the first low-level states for the opinion and aspect extractions are different. The representation used to compute the first low-level state for opinion term extraction is initialized using as: . The representation used to compute the first low-level state for aspect term extraction is initialized using as: . These initializations help us capture interactions between the triplet’s components. is a non-linear function implemented by a MLP, while and are linear functions that are implemented by a single linear layer. The hidden state is obtained in the same way as in the high-level RL process. However, the queries are changed. The query is ”What is the opinion span for the sentiment indicated at ?” for opinion term extraction and ”What is the aspect span for the sentiment indicated at ?” for aspect term extraction.

Policy: The stochastic policy for opinion/aspect extraction

specifies a probability distribution over the actions given the low-level state

and the high-level option that launches the current subtask:


Reward: At every time step, when is executed, the intermediate reward is computed as the prediction error over the gold labels:


where and depends on the aspect/opinion tag type. This enables the model to learn a policy that emphasizes the prediction of B and I tags and avoids only predicting O tags in a trivial manner. When all actions are sampled, there is a final reward for the low-level processes, represented by:


There will also be negative rewards in the cases where the low-level processes produce impossible predictions, namely cases where there are no or more than one tag present, and no or more than one opinion/aspect term identified for each predicted triplet. Note that the low-level rewards are non-zero only in the case where the option from the high-level process is correctly predicted.

2.4. Hierarchical Policy Learning

We learn the high-level policy by maximizing the expected total reward at each time step as the agent samples trajectories following the high-level policies . Likewise, we learn the low-level policies by maximizing the expected total reward at each time step as the agent samples trajectories following the low-level policies . We then optimize all policies using policy gradient methods (Sutton et al., 1999a) with the REINFORCE algorithm (Williams, 1992; Takanobu et al., 2019).

2.5. Training Procedure

We pre-train our ASTE-RL models for 40 epochs with a learning rate of 2e-5. During pre-training, we give our model the ground-truth options or actions at every time step to limit the exploration of the agent due to the high-dimensional state space in our setup. This prevents the agent from exploring too many unreasonable cases, e.g. an I tag preceding a B tag, and learning too slowly. We then fine-tune the best model (chosen based on the Dev

score) with RL policy for 15 epochs with a learning rate of 5e-6. We sample 5 trajectories for each data point during RL fine-tuning.

We initialize the BERT parameters from pre-trained weights (Devlin et al., 2018) and update them during training for this task. We set the dimension of sentiment polarity and opinion/aspect tag embeddings at 300. For POS embeddings, we set the dimension at 25. We randomly initialize these embeddings and update them during training. We set the state vector dimension for and at 300. We apply dropout (Srivastava et al., 2014) after the non-linear activations in and during training and set the dropout rate at 0.5. We train our models in mini-batches of size 16 and optimize the model parameters using the Adam optimizer (Kingma and Ba, 2014).

3. Experiments and Analysis

3.1. Datasets & Evaluation Metrics

14Lap 14Rest 15Rest 16Rest
Train Dev Test Train Dev Test Train Dev Test Train Dev Test
#sentence 906 219 328 1266 310 492 605 148 322 857 210 326
#positive 817 169 364 1692 404 773 783 185 317 1015 252 407
#negative 517 141 116 480 119 155 205 53 143 329 76 78
#neutral 126 36 63 166 54 66 25 11 25 50 11 29
Table 2. ASTE-Data-V2 Dataset Statistics

We use the ASTE-Data-V2 dataset111 curated by Xu et al. (2020) to show the effectiveness of ASTE-RL in two different domains of English reviews, namely the laptop and restaurant domains. 14Rest, 15Rest, 16Rest are the datasets of the restaurant domain and 14Lap is of the laptop domain. We include the statistics of the four datasets in ASTE-Data-V2 in Table 2, where #sentence represents the number of sentences, and #positive, #negative, and #neutral represent the numbers of triplets with positive, negative, and neutral sentiment polarities respectively.

We process the sentences with BERT’s WordPiece tokenizer (Wu et al., 2016) to make them work for ASTE-RL. Since the WordPiece tokenization may break down the tokens in the original dataset into subwords, we need to align the opinion/aspect term annotations and our BIO tagging scheme. We tag every token that corresponds to the opinion/aspect term tokens in the original annotations with , except for the first token, which we tag with .

We follow the evaluation metrics of

Xu et al. (2020) for our experiments. An extracted triplet is correct if the entire aspect term, opinion term, and sentiment polarity match with a ground-truth triplet. We report precision, recall and score based on this.

14Lap 14Rest 15Rest 16Rest
Model Dev Prec Rec Dev Prec Rec Dev Prec Rec Dev Prec Rec
WhatHowWhy (Peng et al., 2020) - 37.38 50.38 42.87 - 43.24 63.66 51.46 - 48.07 57.51 52.32 - 46.96 64.24 54.21
OTE-MTL (Zhang et al., 2020) - 54.26 41.07 46.75 - 63.07 58.25 60.56 - 60.88 42.68 50.18 - 65.65 54.28 59.42
(Wu et al., 2020) - 58.02 40.11 47.43 - 71.41 53.00 60.84 - 64.57 44.33 52.57 - 70.17 55.95 62.26
(Xu et al., 2020) 48.26 54.84 34.44 42.31 53.14 66.76 49.09 56.58 55.06 59.77 42.27 49.52 58.45 63.59 50.97 56.59
(Xu et al., 2020) 45.83 55.98 35.36 43.34 53.54 61.50 55.13 58.14 60.97 64.37 44.33 52.50 60.90 70.94 57.00 63.21
(Wu et al., 2020) - 57.12 53.42 55.21 - 71.76 59.09 64.81 - 54.71 55.05 54.88 - 65.89 66.27 66.08
(Xu et al., 2020) 50.40 53.53 43.28 47.86 56.00 63.44 54.12 58.41 59.86 68.20 42.89 52.66 60.67 65.28 51.95 57.85
(Xu et al., 2020) 48.84 55.39 47.33 51.04 56.89 70.56 55.94 62.40 64.78 64.45 51.96 57.53 63.75 70.42 58.37 63.83
TOP (Huang et al., 2021) - 57.84 59.33 58.58 - 63.59 73.44 68.16 - 54.53 63.30 58.59 - 63.57 71.98 67.52
BMRC (Chen et al., 2021) 56.08 65.91 52.15 58.18 62.83 72.17 65.43 68.64 72.47 62.48 55.55 58.79 70.91 69.87 65.68 67.35
ASTE-RL 58.14 64.80 54.99 59.50 64.40 70.60 68.65 69.61 74.01 65.45 60.29 62.72 72.11 67.21 69.69 68.41
       - Pre-training only 57.35 62.00 55.84 58.73 64.50 69.70 69.23 69.47 72.84 63.31 61.61 62.44 71.50 64.76 70.74 67.57
Table 3. Results of ASTE-RL and Previous Methods on the ASTE-Data-V2 Dataset

3.2. Baselines

We compare the performance of ASTE-RL against the following baselines: (i) WhatHowWhy: Peng et al. (2020) proposed a multi-layer LSTM neural architecture for co-extraction of aspect terms with sentiments, and opinion terms, with a Graph Convolutional Network (Kipf and Welling, 2016) component to capture dependency information to enhance the co-extraction. (ii) OTE-MTL: Zhang et al. (2020) proposed a multi-task learning framework to jointly extract aspect and opinion terms while parsing word-level sentiment dependencies, before conducting a triplet decoding process. We use results from Huang et al. (2021) for OTE-MTL’s performance on ASTE-Data-V2. (iii) GTS: Wu et al. (2020) proposed an end-to-end grid tagging framework and a grid inference strategy to exploit mutual indication between opinion factors. We use results from Huang et al. (2021) for GTS’ performance on ASTE-Data-V2, and report them for two variants: bidirectional LSTM (BiLSTM) and BERT. (iv) JET: Xu et al. (2020) proposed a position-aware tagging scheme for triplet extraction. They encode information about sentiment polarities and distances between the start position of aspect term and the opinion term’s start and end positions () or vice versa (). We report the results for two variants: BiLSTM and BERT. (v) TOP: Huang et al. (2021) proposed a two-stage method to enhance correlations between aspect and opinion terms. Aspect and opinion terms are first extracted with sequence labeling, and artificial tags are added to each pair to establish correlation. A sentiment polarity is then identified for each pair using the resulting representations. (vi) BMRC: Chen et al. (2021) proposed a transformation of the ASTE task into a multi-turn MRC task and a bidirectional MRC framework to address it. They use non-restrictive, restrictive and sentiment classification queries in a three-turn process to extract triplets. We train and test BMRC on ASTE-Data-V2 over 5 runs with different random seeds.

14Lap 14Rest 15Rest 16Rest
Single 62.46 63.04 67.44 66.51 61.33 59.59 66.31 67.83
Multiple 57.70 53.77 70.23 69.17 63.95 56.37 69.84 64.11
No Overlap 62.24 62.80 72.16 70.81 62.17 59.75 70.13 69.23
Overlap 55.94 50.18 67.23 63.96 63.83 55.48 65.20 58.59
Table 4. Scores for Multiple and Overlapping Triplets

3.3. Experimental Results

The experimental results are shown in Table 3. We observe that BERT-based models (their results are in the row above ASTE-RL’s results in Table 3) generally perform better than the non-BERT models. Hence, we only experiment with BERT for our ASTE-RL model. We select our best model for each dataset based on its Dev score. For reproducibility, we report the testing results averaged over 5 runs with different random seeds. ASTE-RL outperforms existing baselines on all four datasets, and significantly outperforms existing baselines on the 15Rest dataset. When compared to the second-best performance for each dataset, we observe an average improvement of 1.68% score across all four datasets, and an improvement of 3.93% on 15Rest. We also observe that our model strikes a balance between the TOP and BMRC models in terms of precision and recall, and hypothesize that this balance can be flexibly shifted depending on to fit dataset requirements, if we generalize , where is the weighted harmonic mean of precision and recall.

3.4. Effect of RL Fine-tuning

In Table 3, we report our results for ASTE-RL without the RL fine tuning step. In this setting, we pre-train our ASTE-RL for 40 epochs as usual and after that we run for another 15 epochs with a learning rate of 5e-6 (as used in RL fine-tuning step). As compared to the RL fine-tuning setting with multinomial sampling, this setting has lower scores with an average decrease of 0.51% over 5 runs with different random seeds. In this setting, our model achieves slightly higher recall, but precision is significantly lower across all four datasets. This might be because multinomial sampling encourages more exploration after the initial pre-training of 40 epochs.

3.5. Analysis on Multiple & Overlapping Triplet Extraction

We show the results of ASTE-RL and BMRC in complex situations where there are multiple and overlapping triplets in a sentence in Table 4. For the multiple triplet scenario, we observe that there is a performance increase for 14Rest, 15Rest and 16Rest and a decrease for 14Lap as compared to the case where only one triplet is present in a sentence. For the overlapping triplet scenario, we observe a performance increase for for 15Rest and a decrease for 14Lap, 14Rest and 16Rest.

In general, we observe that ASTE-RL can handle multiple and overlapping triplets in a sentence consistently well due to its hierarchical RL setup, as compared to BMRC. There is a total decrease of 4.76% for multiple triplet extraction for ASTE-RL across all four datasets as compared to 16.21% for BMRC, and a total decrease for overlapping triplet extraction of 16.16% for ASTE-RL as compared to 34.38% for BMRC.

4. Conclusion

In this work, we propose a novel ASTE-RL model based on hierarchical reinforcement learning (RL) paradigm for aspect sentiment triplet extraction (ASTE). In this paradigm, we treat the aspect and opinion terms as arguments of the sentiment polarities. We decompose the ASTE task into a hierarchy of three subtasks: high-level sentiment polarity extraction, and low-level opinion and aspect term extractions. This approach is good at modeling the interactions between the three tasks and handling multiple and overlapping triplets. We incorporate the multi-turn MRC elements in our model to further improve these interactions. Our proposed model achieves state-of-the-art performance on four challenging datasets for the ASTE task.


This project is supported by the DSO grant no. RTDST190702 awarded to SUTD titled Complex Question Answering.