Human-Like Decision Making: Document-level Aspect Sentiment Classification via Hierarchical Reinforcement Learning

10/21/2019
by   Jingjing Wang, et al.
0

Recently, neural networks have shown promising results on Document-level Aspect Sentiment Classification (DASC). However, these approaches often offer little transparency w.r.t. their inner working mechanisms and lack interpretability. In this paper, to simulating the steps of analyzing aspect sentiment in a document by human beings, we propose a new Hierarchical Reinforcement Learning (HRL) approach to DASC. This approach incorporates clause selection and word selection strategies to tackle the data noise problem in the task of DASC. First, a high-level policy is proposed to select aspect-relevant clauses and discard noisy clauses. Then, a low-level policy is proposed to select sentiment-relevant words and discard noisy words inside the selected clauses. Finally, a sentiment rating predictor is designed to provide reward signals to guide both clause and word selection. Experimental results demonstrate the impressive effectiveness of the proposed approach to DASC over the state-of-the-art baselines.

READ FULL TEXT VIEW PDF

page 1

page 2

page 3

page 4

09/25/2019

Learning to Detect Opinion Snippet for Aspect-Based Sentiment Analysis

Aspect-based sentiment analysis (ABSA) is to predict the sentiment polar...
12/15/2020

Multi-Aspect Sentiment Analysis with Latent Sentiment-Aspect Attribution

In this paper, we introduce a new framework called the sentiment-aspect ...
06/12/2018

Exploiting Document Knowledge for Aspect-level Sentiment Classification

Attention-based long short-term memory (LSTM) networks have proven to be...
04/10/2019

A Variational Approach to Weakly Supervised Document-Level Multi-Aspect Sentiment Classification

In this paper, we propose a variational approach to weakly supervised do...
07/16/2020

Hierarchical Interaction Networks with Rethinking Mechanism for Document-level Sentiment Analysis

Document-level Sentiment Analysis (DSA) is more challenging due to vague...
12/24/2016

Understanding Neural Networks through Representation Erasure

While neural networks have been successfully applied to many natural lan...
05/14/2018

Unpaired Sentiment-to-Sentiment Translation: A Cycled Reinforcement Learning Approach

The goal of sentiment-to-sentiment "translation" is to change the underl...

1 Introduction

Document-level Aspect Sentiment Classification (DASC) is a fine-grained sentiment classification task in the field of sentiment analysis

Pang and Lee (2007); Li et al. (2010). This task aims to predict the sentiment rating for each given aspect mentioned in a document-level review. For instance, Figure 1 shows a review document with four given aspects of a hotel (i.e., location, room, value, service

). The goal of DASC is to predict the rating score towards each aspect by analyzing the whole document. In the last decade, this task has been drawing more and more interests of researchers in the Natural Language Processing community

Titov and McDonald (2008); Yin et al. (2017); Li et al. (2018).

Review Document
This hotel is [red]close to railway station and [red]very convenient to eat around but room of Hilton is [blue] a little uncomfortable. ] I’m often [blue]nitpicking for room decoration. Besides, the price is [black]very expensive although the staff service is [brown]professional.
Rating of Each Aspect
- location: ★ ★ ★ ★ ★  (5)       - room:     ★ ★ ★ ✩ ✩  (3)
- value:    ✩ ✩ ✩ ✩  (1)       - service: ★ ★ ★ ★ ✩  (4)
Figure 1: An example of a review document, where clauses and words with different colors refer to different aspects.

In previous studies, neural models have shown to be effective for performance improvement on DASC. Despite the advantages, these complex neural network approaches often offer little transparency w.r.t. their inner working mechanisms and suffer from the lack of interpretability. However, clearly understanding where and how such a model makes such a decision is rather important for developing real-world applications Liu et al. (2018); Marcus (2018).

As human beings, if asked to evaluate the sentiment rating for a specific aspect in a document, we often perform sentiment prediction in two steps. First, we select some aspect-relevant snippets (e.g., sentences/clauses) inside the document. Second, we select some sentiment-relevant words (e.g., sentiment words) inside these snippets to make a rating decision. For instance, for aspect location in Figure 1, we first select the aspect-relevant clauses, i.e., Clause1 and Clause2, and then select sentiment-relevant words, i.e., “ close” and “very convenient” inside the two clauses, for making the rating decision (5 stars).

Inspired by the above cognitive process of human beings, one ideal and interpretable solution for DASC is to select aspect-relevant clauses and sentiment-relevant words, discarding those noisy parts of a document for decision making. In this solution, two major challenges exist which are illustrated as follows.

The first challenge is how to select aspect-relevant clauses and discard those irrelevant and noisy clauses. For instance, for aspect location, Clause5 mentioning another aspect value (only 1 star) may induce the noise and should be discarded, because the noise can provide wrong signals to mislead the model into assigning very low sentiment rating to aspect location. One possible way to alleviate this noisy problem is to leverage the soft-attention mechanism as proposed in DBLP:conf/coling/LiYZ18 and DBLP:conf/ijcai/WangLLKZSZ18. However, this soft-attention mechanism has the limitation that the softmax

function always assigns small but non-zero probabilities to noisy clauses, which will weaken the attention given to the few truly significant clauses for a particular aspect. Therefore, a well-behaved approach should discard noisy clauses for a specific aspect during model training.

The second challenge is how to select sentiment-relevant words and discard those irrelevant and noisy words. For instance, for aspect location, words “this”, “is” in Clause1 are noisy words and should be discarded since they make no contribution to implying the sentiment rating. One possible way to alleviate this problem is to also leverage the soft-attention mechanism as proposed in DBLP:conf/coling/LiYZ18. However, this soft-attention mechanism may induce additional noise and lack interpretability because it tends to assign higher weights to some domain-specific words rather than real sentiment-relevant words Mudinas et al. (2012); Zou et al. (2018). For instance, this soft-attention mechanism tends to regard the name of a hotel “Hilton” with a good reputation in Clause3 as a positive word which could mislead the model into assigning a higher rating to aspect room. Therefore, a well-behaved approach should highlight sentiment-relevant words and discard noisy words for a specific aspect during model training.

In this paper, we propose a Hierarchical Reinforcement Learning (HRL) approach with a high-level policy and a low-level policy to address the above two challenges in DASC. First, a high-level policy is leveraged to select aspect-relevant clauses and discard noisy clauses during model training. Then, a low-level policy is leveraged to select sentiment-relevant words and discard noisy words inside the above selected clauses. Finally, a sentiment rating predictor is designed to provide reward signals to guide both clause and word selection. The empirical studies show that the proposed approach performs well by incorporating the clause selection and word selection strategies and significantly outperforms several state-of-the-art approaches including those with the soft-attention mechanism.

2 Hierarchical Reinforcement Learning

(a)
Figure 2: Overall architecture of our Hierarchical Reinforcement Learning (HRL) approach.

Figure 2 shows the overall framework of our Hierarchical Reinforcement Learning (HRL) approach which contains three components: a high-level policy for clause selection (Section 2.2); a low-level policy for word selection (Section 2.3); a sentiment rating predictor for providing reward signals to guide both the above clause and word selection (Section 2.4).

As a preprocessing, we adopt RST style discourse segmentation111In preliminary experiments, we tried directly adopting sentence segmentation by following DBLP:conf/coling/LiYZ18 rather than clause splitting and found that adopting clause splitting achieves better performance. Detailed comparison results are presented in ablation study of Section 3.2. MANN (1988) to segment all documents in corpus into Elementary Discourse Units (EDUs), and consider thse EDUs as clauses by following DBLP:conf/ijcai/WangLLKZSZ18.

In summary, we formulate the task of DASC as a semi-Markov Decision process

Sutton et al. (1999b), i.e., hierarchical reinforcement learning with a high-level policy and a low-level policy. In particular, our HRL approach for DASC works as follows. Given a review document with a clause sequence and an aspect, the high-level policy decides whether a clause mentions this aspect. If yes, the high-level policy selects this clause and launches the low-level policy, which scans the words inside this selected clause one by one in order to select sentiment-relevant words. Otherwise, the high-level policy skips current clause and turns to the next clause until all clauses in the review document are scanned. During clause and word selection, a sentiment rating predictor is employed to provide reward signals to guide the above clause and word selection.

2.1 Clause Selection with High-level Policy

Assume that a review document with a given aspect has been segmented into a clause sequence , high-level policy aims to select clause which truly mentions aspect and discard noisy ones. Here, clause consists of words . Once a clause is selected, it is passed to the low-level policy for further word selection.

During clause selection, we adopt a stochastic policy as high-level policy

, which can generate a conditional probability distribution

over option (i.e., high-level action) sequence . Here, indicates is selected; otherwise indicates is discarded. More specifically, we adopt a LSTM model to construct high-level policy for performing clause selection over the clause sequence. In , the hidden state of clause and memory cell at -th time-step are given by,

(1)

where

is the vector representation of clause

and initialized by hidden state of the last word in clause . Here, is obtained from the pre-trained (presented in line 3 of Algorithm 1); denotes all gate functions and update function of . Note that if , will skip (i.e., discard) and not encode clause , memory cell and hidden state of current time-step are then directly copied from the previous time-step .

In principle, the high-level policy uses a Reward to guide clause selection over the clause sequence. It samples an Option with the probability at each State . More concretely, the state, option and reward of are defined as follows.

State. The state at -th time-step should provide adequate information for deciding to select a clause or not for . Thus, state is composed of four parts, i.e., , , and , defined as , where is aspect embedding222Aspect embedding is initialized by an average of its keywords embeddings. Keywords for aspects (e.g., keywords for aspect service are service, smile etc.) are obtained from DBLP:conf/emnlp/YinSZ17. of .

Option. samples option by the conditional probability which could be cast as a binary classification problem. Thus, we adopt logistic function to define .

(2)

where is the trainable parameter; denotes sampling operation; denotes sigmod function.

Reward. In order to select aspect-relevant clauses inside a clause sequence , given a sampled option trajectory , we compute the high-level cumulative reward at -th time-step as follows:

(3)

where consists of three different terms: 1) The first term is a cosine intermediate reward computed by similarity between aspect embedding and hidden state of the -th clause . This reward provides aspect supervision signals to guide the policy to select aspect-relevant clauses. 2) The second term is an intermediate reward from low-level policy after the word selection in the selected clause is finished. Note that if clause is discarded, . This reward provides a feedback to indicate how well clause selection is. 3) The third term is a delay reward from sentiment rating predictor. After finishes all options, we feed the last hidden state of to the softmax decoder of sentiment rating predictor and then obtain a rating probability for ground-truth rating label to compute this delay reward. This reward provides additional reward signals to guide policy to select discriminative clauses. Besides, is the discount factor; , and are weight parameters.

2.2 Word Selection with Low-level Policy

Given a word sequence of clause selected by the high-level policy, low-level policy aims to select the sentiment-relevant word and discard noisy ones.

During word selection, we still adopt a stochastic policy as low-level policy , which can generate a conditional probability distribution over action sequence , where indicates -th word in -th clause is selected; indicates is discarded. Similar to clause selection, we adopt another LSTM model to construct low-level policy for performing word selection over word sequence of each clause (Note that, as shown in Figure 2, is shared by all selected clauses from high-level policy). In , the hidden state of word and memory cell at -th time-step (Here, we omit the clause index and only use to denote the -th time-step in -th clause ) are given by,

(4)

where is word embedding of . Similar to , in , if , i.e., word is discarded, memory cell and hidden state of current time-step are directly copied from the previous time-step. Then, we illustrate state, action and reward of low-level policy as follows.

State. The state at -th time-step should provide adequate information for deciding to select a word or not. Thus, the state is composed of three parts, i.e., , , , defined as .

Action. samples action by the conditional probability . Thus, similar to high-level policy, we adopt logistic function to define .

(5)

where is the trainable parameter.

Reward. Similarly, in order to select sentiment-relevant words inside a word sequence , given a sampled action trajectory , we compute the low-level cumulative reward at -th time-step as:

(6)

where consists of two terms: 1) Similar to high-level policy, the first term is a delay reward provided by sentiment rating predictor. After finishes all actions, we feed last hidden state in -th clause to softmax decoder of sentiment rating predictor and then we can obtain this delay reward. This reward provides rating supervision information to guide policy to select discriminative words, i.e., sentiment-relevant words. 2) The second term is a penalty delay reward.

denotes the number of selected words. The basic idea of using this penalty reward is to select words as small as possible because sentiment-relevant words is usually a small subset of all words inside the clause. Note that, we could also adopt external sentiment lexicons to achieve this goal, but sentiment lexicons are difficult to obtain in many real-world applications. Besides,

, are weight parameters.

2.3 Sentiment Rating Predictor

The goal of sentiment rating predictor lies in two-fold. On one hand, during model training, the goal of sentiment rating predictor is to use a softmax decoder to provide rating probabilities as the reward signals (see Eq.(3) and Eq.(6)) to guide both clause and word selection.

On the other hand, when model training is finished, i.e., both high-level and low-level policy finish all their selections, the goal of sentiment rating predictor is to perform DASC. Specifically, we first regard last state of as the representation of all selected clauses while last state of as the representation of all selected words. Then, we concatenate and to compute the final representation of the review document as: , where denotes concatenation operation. Finally, to perform DASC, we feed to a softmax decoder as follows:

Softmax Decoder. We first feed

to a softmax classifier

, where is output vector; is trainable parameter. Then, the probability of labeling sentence with sentiment rating is computed by . Finally, the label with the highest probability stands for the predicted sentiment rating for .

Note that, for an aspect, if no clauses and words are finally selected from a review document, the model will assign a random rating for this aspect.

2.4 Model Training via Policy Gradient and Back-Propagation

The parameters in HRL are learned according to Algorithm 1. Specifically, these parameters can be divided into two groups: 1) and of high-level policy and low-level policy respectively. 2) of , and softmax decoder.

For of high-level policy, we optimize it with policy gradient (REINFORCE) Williams (1992); Sutton et al. (1999a). The policy gradient w.r.t. is computed by differentiating the maximized expected reward as follows:

(7)

where

is the advantage estimate of the high-level reward. Here,

is the baseline Williams (1992)

which is used to reduce the variance of the high-level reward without altering its expectation theoretically. In practical use, we sample some trajectories

over the clause sequence with the current high-level policy. The model will assign a reward score to each sequence according to the designed scores function, and then estimates as the average of those rewards. Similarly, the policy gradient w.r.t. of low-level policy is given by,

(8)

where is the advantage estimate of the low-level reward. Similarly, is used to reduce the variance of the low-level reward.

For , we optimize it with back-propagation. The objective of learning is to minimize the cross-entropy loss in the classification phase, i.e.,

(9)

where denotes a review document with a given aspect from corpus ; is ground-truth sentiment rating for aspect . is a regularization.

1:  Input: Corpus ; a review document with a clause sequence ; a clause with a word sequence .
2:  Initialize parameters , and randomly;
3:  Pre-train by forcing to select all words for classification, and update of by Eq.(9);
4:  Pre-train by forcing to select all clauses for classification, and update of by Eq.(9);
5:  Fix all parameters and update , as follows:
6:  for review document  do
7:     for clause  do
8:         Sample option ;
9:         if option  then
10:            for word  do
11:               Sample action ;
12:            end for
13:            Compute by Eq.(6);
14:            Update by Eq.(8);
15:         end if
16:     end for
17:     Compute by Eq.(3);
18:     Update by Eq.(7);
19:  end for
Algorithm 1 Hierarchical reinforcement learning
Datasets #documents #words/doc #Aspect
TripUser 58632 181.03 7
TripAdvisor 29391 251.7 7
BeerAdvocate 51020 144.5 4
Table 1: Statistics of three datasets. #words/doc denotes the number of words (average per document). #Aspect denotes the number of aspects in each dataset. The rating scales of TripAdvisor and BeerAdvocate are 1-5 and 1-10 respectively.
Approaches TripUser TripAdvisor BeerAdvocate

Development Test Development Test Development Test

Acc. MSE Acc. MSE Acc. MSE Acc. MSE Acc. MSE Acc. MSE




SVM
- - 46.35 1.025 34.30 1.982 35.26 1.963 25.70 3.286 25.79 3.270


LSTM
53.23 0.787 52.74 0.794 43.85 1.525 44.02 1.470 35.23 2.112 34.78 2.097



MAMC
- - 55.49 0.583 46.21 1.091 46.56 1.083 39.43 1.696 38.06 1.755

HARN
- - 58.15 0.528 - - 48.21 0.923 39.81 1.672 38.19 1.751

HUARN
- - 60.70 0.514 - - - - - - - -

C-HAN
58.49 0.602 57.38 0.543 47.61 0.914 47.08 0.955 38.67 1.703 37.95 1.801




HS-LSTM
59.75 0.566 59.01 0.524 48.45 0.947 46.84 1.013 37.43 1.870 36.83 1.912


RL-Word-Selection
60.15 0.475 59.55 0.519 48.55 0.913 48.51 0.917 39.92 1.648 38.45 1.697

RL-Clause-Selection
61.32 0.433 60.54 0.461 51.05 0.762 50.02 0.781 41.39 1.505 39.76 1.622

HRL
62.97 0.336 62.84 0.351 52.71 0.652 52.27 0.662 43.41 1.416 41.39 1.503
Table 2: Comparison of our approaches and other baseline approaches to DASC. The results with symbol are retrieved from DBLP:conf/coling/LiYZ18 and those with are from DBLP:conf/emnlp/YinSZ17

3 Experimentation

3.1 Experimental Settings

Data. We conduct our experiments on three public datasets on DASC, i.e., TripUser Li et al. (2018), TripAdvisor Wang et al. (2010) and BeerAdvocate McAuley et al. (2012); Lei et al. (2016). In the experiment, we adopt Discourse Segmentation Tool333http://alt.qcri.org/tools/discourse-parser/ to segment all reviews in the three datasets into EDUs (i.e., clauses). Moreover, we adopt training/development/testing settings (8:1:1) by following DBLP:conf/emnlp/YinSZ17,DBLP:conf/coling/LiYZ18. Table 1 shows the statistics of the three datasets.

Implementation Details. We adopt the pre-trained 200-dimension word embeddings provided by DBLP:conf/emnlp/YinSZ17. The dimension of LSTM hidden states is set to be 200. The other hyper-parameters are tuned according to the performance in the development set. Specifically, we adopt Adam optimizer Kingma and Ba (2014) with an initial learning rate of 0.012 for cross-entropy training and adopt SGD optimizer with a learning rate of 0.008 for all policy gradients training. For rewards of high-level and low-level policies, is 0.8; , and are 0.25, 0.25 and 0.5 respectively. , are 0.6 and 0.4. Additionally, the batch size is set to be 64, regularization weight is set to be and the dropout rate is 0.2.

Evaluation Metrics. The performance is evaluated using Accuracy (Acc.) and MSE as DBLP:conf/emnlp/YinSZ17. Moreover, -test is used to evaluate the significance of the performance difference between two approaches Yang and Liu (1999).

Baselines. We compare HRL with the following baselines: 1) SVM Yin et al. (2017). This approach only adopts unigram, bigram as features to train an SVM classifier. 2) LSTM Tang et al. (2015). This is a neural network approach to document-level sentiment classification which employs gated LSTM to learn text representation. 3) MAMC Yin et al. (2017). This approach employs hierarchical iterative attention to learn aspect-specific representation. This is a state-of-the-art approach to DASC. 4) HARN Li et al. (2018). This approach adopts hierarchical attention to incorporate overall rating and aspect information so as to learn aspect-specific representation. This is another state-of-the-art approach to DASC. 5) HUARN Li et al. (2018). This approach extends HARN by integrating additional user information. This is another state-of-the-art approach to DASC. 6) C-HAN Wang et al. (2018). This approach adopts hierarchical attention to incorporate clause and aspect information so as to learn text representation. Although this is a state-of-the-art approach to sentence-level ASC, it could also be directly applied in DASC. 7) HS-LSTM Zhang et al. (2018). This is a reinforcement learning approach to text classification, which employs a hierarchically LSTM to learn text representation. 8) RL-Word-Selection. Our approach which leverages only the word selection strategy by using the low-level policy. 9) RL-Clause-Selection. Our approach which leverages only the clause selection strategy by using the high-level policy.

Approaches TripUser TripAdvisor BeerAdvocate

Acc. MSE Acc. MSE Acc. MSE
HRL 62.84 0.351 52.27 0.662 41.39 1.503
   w/o  cosine intermediate reward in Eq.(3) 60.15 0.487 49.02 0.811 38.72 1.673

   w/o  
penalty delay reward in Eq.(6)
61.52 0.425 51.15 0.711 40.13 1.577
   w/o  
the representation of selected clauses
58.40 0.622 46.33 1.207 38.61 1.684
   w/o  
the representation of selected words
60.08 0.497 48.94 0.886 39.92 1.626

   using sentence splitting instead of clause
59.74 0.504 50.11 0.842 39.33 1.651

Table 3: Ablation study of HRL on three different datasets.

3.2 Experimental Results

Table 2 shows the performance comparison of different approaches. From this table, we can see that, all LSTM-based approaches outperform SVM, showing that LSTM has potentials in automatically learning text representations and can bring performance improvement for DASC.

Four state-of-the-art ASC approaches including MAMC, HARN, HUARN and C-HAN all perform better than LSTM. These results confirm the helpfulness of considering aspect information in DASC. Besides, we find the reinforcement learning based approach HS-LSTM without considering aspect information can achieve comparable performance with MAMC, HARN, C-HAN, and even beat MAMC on two datasets TripUser and TripAdvisor, which demonstrates that using reinforcement learning is a good choice to learn text representation for DASC.

Our approach RL-Word-Selection and RL-Clause-Selection outperform most above approaches and they only perform slightly worse than HUARN. This result encourages to perform clause or word selection in DASC. Among all these approaches, our approach HRL performs best and it significantly outperforms (-value 0.01) strong baseline HUARN which actually considers some other kinds of external information, such as the overall rating and the user information. These results encourage to perform both clause and word selection in DASC.

Ablation Study. Further, we conduct the ablation study of HRL to evaluate the contribution of each component. The results are shown in Table 3. From this table, we can see that 1) Using cosine intermediate reward in Eq.(3) can averagely improve Acc. by 2.87% on three different datasets. 2) Using penalty delay reward in Eq.(6) can averagely improve Acc. by 1.23%. 3) To concatenate additional representation of the selected clauses in rating predictor can improve Acc. by 4.38%. 4) To concatenate additional representation of the selected words in rating predictor can improve Acc. by 2.72%. 5) Using the clause splitting instead of sentence splitting could improve Acc. by 2.44%. This confirms that it is more appropriate to consider clauses as the segmentation units than sentences. This is because that 90% of clauses contain only one opinion expression as proposed in DBLP:journals/ldvf/BayoudhiGKB15. For instance, as shown in Figure 1, if we use the sentences as the segmentation units, Clause1-Clause3 will be assigned into one unit while they talk about two aspects, i.e., location and room. In this scenario, sentence selection will not be able to discard noisy parts inside the sentence for the aspect location or room.

(a)

Figure 3:

Training reward with different epochs

(a) For aspect location
(b) For aspect room
Figure 4: Visualizations of clause selection (along with the row ) and word selection (along with the column ) for aspect (a) location and (b) room. Red denotes the clause has been selected, blue denotes the word has been selected and other colors denote the token has been discarded.

4 Analysis and Discussion

Analysis of HRL Training. Figure 3 shows two average rewards (each epoch) of high-level and low-level policy on BeerAdvocate respectively. To clearly observe the change of the reward, following DBLP:journals/corr/LillicrapHPHETS15, all rewards are normalized to . From this figure, we can see that, both the high-level and low-level reward increase as the training algorithm iterates. This result indicates that our HRL approach is capable of stably revising its policies to obtain more discriminative clauses and words for better performing.

Analysis of Clause and Word Selection. Figure 4 shows visualizations of our HRL approach which performs the clause selection and word selection on a review document. From this figure, we can see that HRL is able to precisely select aspect-relevant clauses, i.e., Clause1 and Clause2, for aspect location while select Clause3 and Clause4 for room. Further, HRL is able to select all sentiment-relevant words, such as “close” and “very convenient” for aspect location, while “a little uncomfortable” and “nitpicking”for room.

Error Type Breakdown. We analyze error cases in the experiments and broadly categorize them into three types: (1) The first type of errors are due to negation words. For instance, for the review “The taste of this beer is not good, don’t buy it”, HRL could precisely select the sentiment “good”, but fail to select the negation word “not”. This inspires us to work on optimizing our approach in order to capture negation scope better in our future work. (2) The second type of errors are due to comparative opinions. For instance, for the review “The room of Sheraton is much better than this one.”, HRL incorrectly predicts high rating (5 stars) to aspect room. It would be interesting to see if incorporating syntactic information can solve this problem and bring performance improvement. (3) Finally, some errors are due to mistakes during clause splitting (i.e., EDU splitting). For instance, for the review “[This hotel having good location] [often needs lots of time to check in.]”, it is assigned into one clause while it talks two aspects, i.e., location and check in/front desk. This encourages to improve the performance of clause splitting for informal review texts.

5 Related Work

Aspect Sentiment Classification. Traditional studies for DASC mainly focus on feature engineering to explore efficient features for DASC Titov and McDonald (2008); Lu et al. (2011); McAuley et al. (2012). Recently, neural networks with the characteristic of automatically mining features have shown promising results on DASC. DBLP:conf/emnlp/LeiBJ16 focused on extracting rationales for aspects and build a neural text regressor to predict aspect rating; DBLP:conf/emnlp/YinSZ17 focused on using hierarchical iterative attention to learn aspect-specific text representation for DASC; DBLP:conf/coling/LiYZ18 employed a hierarchical attention approach to DASC which incorporates both the external user and overall rating information. Besides, neural networks have been widely adopted for performing a closely related task, i.e., Sentence-level Aspect Sentiment Classification Wang et al. (2016); Tang et al. (2016); Wang and Lu (2018).

Reinforcement Learning.

In recent years, reinforcement learning has been applied successfully to some NLP tasks. DBLP:journals/corr/Guo15b employed deep Q-learning to improve the seq2seq model for the text generation task; DBLP:journals/corr/LiMRGGJ16 showed how to apply deep reinforcement learning to model future reward in the chatbot dialogue task; DBLP:journals/corr/abs-1811-03925 employed hierarchical reinforcement learning to model the relation extraction task; DBLP:conf/aaai/ZhangHZ18 combined LSTM with reinforcement learning to learn structured representations for the text classification task, which is inspirational to our approach.

Unlike all above studies, inspired by the cognitive process of human beings, this paper proposes a new HRL approach to DASC task. To the best of our knowledge, this is the first attempt to address DASC with HRL.

6 Conclusion

In this paper, we propose a hierarchical reinforcement learning approach to DASC. The main idea of the proposed approach is to perform sentiment classification like human beings. Specifically, our approach employs a high-level policy and a low-level policy to perform clause selection and word selection in DASC respectively. Experimentation shows that both the clause and word selection are effective for DASC and the proposed approach significantly outperforms several state-of-the-art baselines for DASC.

In our future work, we would like to solve other challenges in DASC, e.g., negation detection problem, to further improve the performance. Furthermore, we would like to apply our HRL approach to other sentiment analysis tasks, such as aspect and opinion co-extraction, and dialog-level sentiment analysis.

Acknowledgments

We thank our anonymous reviewers for their helpful comments. This work was supported by three NSFC grants, i.e., No.61672366, No.61702149 and No.61525205. This work was also supported by the joint research project of Alibaba Group and Soochow University, and a Project Funded by the Priority Academic Program Development of Jiangsu Higher Education Institutions (PAPD).

References

  • D. Kingma and J. Ba (2014) Adam: A method for stochastic optimization. CoRR abs/1412.6980. External Links: Link, 1412.6980 Cited by: §3.1.
  • T. Lei, R. Barzilay, and T. Jaakkola (2016) Rationalizing neural predictions. See DBLP:conf/emnlp/2016, External Links: Link Cited by: §3.1.
  • J. Li, H. Yang, and C. Zong (2018) Document-level multi-aspect sentiment classification by jointly modeling users, aspects, and overall ratings. See DBLP:conf/coling/2018, External Links: Link Cited by: §1, §3.1, §3.1.
  • S. Li, C. Huang, G. Zhou, and S. Y. M. Lee (2010) Employing personal/impersonal views in supervised and semi-supervised sentiment classification. See DBLP:conf/acl/2010, External Links: Link Cited by: §1.
  • X. Liu, L. Mou, H. Cui, Z. Lu, and S. Song (2018) JUMPER: learning when to make classification decisions in reading. CoRR abs/1807.02314. External Links: Link, 1807.02314 Cited by: §1.
  • B. Lu, M. Ott, C. Cardie, and B. K. Tsou (2011) Multi-aspect sentiment analysis with topic models. See DBLP:conf/icdm/2011w, External Links: Link, Document Cited by: §5.
  • W. MANN (1988) Rhetorical structure theory : toward a functional theory of text organization. Text 8 (3), pp. 243–281. External Links: ISSN , Link, Document Cited by: §2.
  • G. Marcus (2018) Deep learning: A critical appraisal. CoRR abs/1801.00631. External Links: Link, 1801.00631 Cited by: §1.
  • J. McAuley, J. Leskovec, and D. Jurafsky (2012) Learning attitudes and attributes from multi-aspect reviews. See DBLP:conf/icdm/2012, External Links: Link, Document Cited by: §3.1, §5.
  • A. Mudinas, D. Zhang, and M. Levene (2012) Combining lexicon and learning based approaches for concept-level sentiment analysis. In Proceedings of the first international workshop on issues of sentiment discovery and opinion mining, pp. 5. Cited by: §1.
  • B. Pang and L. Lee (2007) Opinion mining and sentiment analysis. Foundations and Trends in Information Retrieval 2 (1-2). External Links: Link, Document Cited by: §1.
  • R. Sutton, D. McAllester, and etc. (1999a) Policy gradient methods for reinforcement learning with function approximation. See DBLP:conf/nips/1999, External Links: Link Cited by: §2.4.
  • R. Sutton, D. Precup, and S. Singh (1999b) Between mdps and semi-mdps: A framework for temporal abstraction in reinforcement learning. Artif. Intell.. External Links: Link, Document Cited by: §2.
  • D. Tang, B. Qin, and T. Li (2015)

    Document modeling with gated recurrent neural network for sentiment classification

    .
    See DBLP:conf/emnlp/2015, External Links: Link Cited by: §3.1.
  • D. Tang, B. Qin, and T. Liu (2016) Aspect level sentiment classification with deep memory network. See DBLP:conf/emnlp/2016, External Links: Link Cited by: §5.
  • I. Titov and R. McDonald (2008) A joint model of text and aspect ratings for sentiment summarization. See DBLP:conf/acl/2008, External Links: Link Cited by: §1, §5.
  • B. Wang and W. Lu (2018) Learning latent opinions for aspect-level sentiment classification. See DBLP:conf/aaai/2018, Cited by: §5.
  • H. Wang, Y. Lu, and C. Zhai (2010) Latent aspect rating analysis on review text data: a rating regression approach. See DBLP:conf/kdd/2010, External Links: Link, Document Cited by: §3.1.
  • J. Wang, J. Li, S. Li, Y. Kang, M. Zhang, L. Si, and G. Zhou (2018) Aspect sentiment classification with both word-level and clause-level attention networks. See DBLP:conf/ijcai/2018, External Links: Link, Document Cited by: §3.1.
  • Y. Wang, M. Huang, X. Zhu, and L. Zhao (2016) Attention-based LSTM for aspect-level sentiment classification. See DBLP:conf/emnlp/2016, External Links: Link Cited by: §5.
  • R. Williams (1992) Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learning 8. External Links: Link, Document Cited by: §2.4, §2.4.
  • Y. Yang and X. Liu (1999) A re-examination of text categorization methods. See DBLP:conf/sigir/99, External Links: Link, Document Cited by: §3.1.
  • Y. Yin, Y. Song, and M. Zhang (2017) Document-level multi-aspect sentiment classification as machine comprehension. See DBLP:conf/emnlp/2017, External Links: Link Cited by: §1, §3.1.
  • T. Zhang, M. Huang, and L. Zhao (2018) Learning structured representation for text classification via reinforcement learning. See DBLP:conf/aaai/2018, External Links: Link Cited by: §3.1.
  • Y. Zou, T. Gui, Q. Zhang, and X. Huang (2018)

    A lexicon-based supervised attention model for neural sentiment analysis

    .
    See DBLP:conf/coling/2018, External Links: Link Cited by: §1.