Detecting Domain Polarity-Changes of Words in a Sentiment Lexicon

04/29/2020 ∙ by Shuai Wang, et al. ∙ University of Illinois at Chicago USTC 0

Sentiment lexicons are instrumental for sentiment analysis. One can use a set of sentiment words provided in a sentiment lexicon and a lexicon-based classifier to perform sentiment classification. One major issue with this approach is that many sentiment words are domain dependent. That is, they may be positive in some domains but negative in some others. We refer to this problem as domain polarity-changes of words. Detecting such words and correcting their sentiment for an application domain is very important. In this paper, we propose a graph-based technique to tackle this problem. Experimental results show its effectiveness on multiple real-world datasets.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Sentiment words, also called opinion/polar words, are words that convey positive or negative sentiments Pang and Lee (2008). Such sentiment-bearing words are usually pre-compiled as word lists in a sentiment lexicon, which is instrumental as well as an important linguistic resource to sentiment analysis Liu (2012). So far, numerous studies about how to construct lexicons have been reported, which will be discussed in Section 2.

Despite the fact that there is extensive research on lexicon construction, limited work has been done to solve the problem of identifying and handling sentiment words in a lexicon that have domain-dependent polarities. In real-life applications, there are almost always some sentiment words that express sentiments different from their default polarities provided in a general-purpose sentiment lexicon. For example, in the sentiment lexicon compiled by Hu and Liu (2004), the word “crush” is associated with a negative sentiment, but it actually shows a positive opinion in domain blender because “crush” indicates that a blender works well, e.g., in the sentence “it does crush the ice!”. We call this problem domain polarity-change of a word in a sentiment lexicon.

The polarity change of words plays a crucial role in sentiment classification. As we will see in the experiment section, without identifying and correcting such domain dependent sentiment words, sentiment classification performance could be much inferior. Although some researchers have studied the domain-specific sentiment problem with sentiment lexicons, their focuses are quite different and their approaches are not suitable for our task. We will discuss them further in the following sections.

It is important to note that our work mainly aims to help lexicon-based sentiment classification approaches  Taboada et al. (2011)

. It does not directly help machine-learning (ML) or supervised learning approaches 

Li et al. (2018b) because the domain-dependent polarities of words are already reflected in the manually labeled training data. Notice that for those ML approaches, the manual annotation for each application domain is required, which is a time-consuming and labor-intensive task, and is thus hard to scale up. In many real-world scenarios, lexicon-based approaches are useful and could be a better alternative Liu (2012).

However, to effectively apply a sentiment lexicon for an application domain, the domain-polarity change problem discussed above needs to be addressed. To this end, we propose a new graph-based approach named Domain-specific Sentiment Graph (DSG). It works with three main steps: (domain) sentiment words collection, (domain) sentiment correlation extraction, and graph construction and inference, which will be detailed in Section 4. Our experimental results show its effectiveness in detecting domain polarity changes of words on multiple real-world datasets. We will also see with the detection of the sentiment polarities of those words, a huge performance gain can be achieved in sentiment classification.

2 Related Work

This work concerns about domain polarity changes of words in lexicons. So we first discuss the works related to sentiment lexicons, and then domain sentiment, and finally domain sentiment with lexicons.

Extensive studies have been done for sentiment lexicons and the majority of them focus on lexicon construction. These approaches can be generally categorized as dictionary-based and corpus-based.

Dictionary-based approaches first used some sentiment seed words to bootstrap based on the synonym and antonym structure of a dictionary Hu and Liu (2004); Valitutti et al. (2004). Later on, more sophisticated methods were proposed Kim and Hovy (2004); Esuli and Sebastiani (2005); Takamura et al. (2007); Blair-Goldensohn et al. (2008); Rao and Ravichandran (2009); Mohammad et al. (2009); Hassan and Radev (2010); Dragut et al. (2010); Xu et al. (2010); Peng and Park (2011); Gatti and Guerini (2012); San Vicente et al. (2014). Corpus-based approaches build lexicons by discovering sentiment words in a large corpus. The first idea is to exploit some coordinating conjunctions Hatzivassiloglou and McKeown (1997); Hassan and Radev (2010). Kanayama and Nasukawa (2006) extended this approach by introducing inter-sentential sentiment consistency. Other related work includes Kamps et al. (2004); Kaji and Kitsuregawa (2006); Wang et al. (2017). The second idea is to use syntactic relations between opinion and aspect words Zhuang et al. (2006); Wang and Wang (2008); Qiu et al. (2011); Volkova et al. (2013). The third idea is to use word co-occurrences for lexicon induction Turney and Littman (2003); Igo and Riloff (2009); Velikovich et al. (2010); Yang et al. (2014); Rothe et al. (2016).

However, our work is very different as we focus on detecting domain dependent sentiment words in a given general-purpose sentiment lexicon.

Also related is the existing research about domain and context dependent sentiment. First, despite the fact that several researchers have studied context dependent sentiment words, which are based on specific sentences and topic/aspect context Wilson et al. (2005); Ding et al. (2008); Choi and Cardie (2008); Wu and Wen (2010); Jijkoun et al. (2010); Lu et al. (2011); Zhao et al. (2012); Kessler and Schütze (2012); Teng et al. (2016); Wang et al. (2018); Li et al. (2018a)

, our work is based on domains. Second, while the studies on transfer learning or domain adaptation for sentiment analysis deals with domain information 

Bhatt et al. (2015); Yu and Jiang (2016); Li et al. (2018b), our work does not lie in this direction. We do not have any source domain and our goal is not to transfer domain knowledge to another domain. Third, most importantly, the above works are either irrelevant to lexicons or not for detecting the sentiment discrepancy problem found in lexicons towards a particular domain.

Our work is most related to the following studies that involve both the sentiment lexicons and domain sentiment problem. Choi and Cardie (2009) adapted the word-level polarities of a general-purpose sentiment lexicon to a particular domain by utilizing the expression-level polarities in that domain. However, their work targeted at reasoning the sentiment polarities of multi-word expressions. It does not detect or revise the sentiment polarities of individual words in the lexicon for a particular domain, and hence, cannot solve our problem. Du et al. (2010) studied the problem of adapting the sentiment lexicon from one domain to another domain. It further assumes that the source domain has a set of sentiment-labeled reviews. Their technique is therefore more about transfer learning and their learning settings differ from ours intrinsically. Perhaps, the most related work is Hamilton et al. (2016), which generates domain-specific lexicons using some seed lexicon words, word embeddings, and the random walk algorithm. However, their model is primarily for lexicon construction, with domain-specific information involved/guided. It does not aim to detect/change the sentiment polarity from a given lexicon. It is thus not directly applicable to our task. To make it workable, we design a two-step approach, which will be detailed in the experiment section (Section 5).

3 Problem Definition

We first give the formal problem definition of detecting domain polarity-changes of words in a lexicon. Definition: given a general-purpose sentiment lexicon (the lexicon containing sentiment words and their default sentiment polarities) and an application domain review corpus , to identify a subset of words in that have different sentiment polarities in that domain (different from their default polarities), which we call polarity-changed sentiment words and denote them as .

In the rest of this paper, we call the words in a given lexicon lexical words for short. The term detection will generally stand for the detection of domain polarity-changes of words.

4 Proposed Solution

To tackle the above problem, we propose a graph-based learning approach named Domain-specific Sentiment Graph (DSG). It works with three major steps: (1) (domain) sentiment words collection, (2) (domain) sentiment correlation extraction, and (3) graph construction and inference.

Specifically, it first collects a set of mentioned sentiment words in the domain corpus . It then mines multiple types of relationships among sentiment words in , which are denoted as a relationship set . The relationships are identified based on different types of linguistic connectivity. Next, it builds a probabilistic graph with each node representing a sentiment word in and each edge representing a relation (from

) between two words. An inference method is then applied to re-estimate the domain-specific polarities (or beliefs) of sentiment words. With the re-estimated beliefs obtained in the application domain, those sentiment words with changed polarities can be detected, based on the sentiment shift of a lexical word between its induced (in-domain) sentiment belief and its original (lexicon-based) polarity.

In this learning manner, the proposed approach requires no prior knowledge or annotated data about a particular domain. It is thus applicable to multiple/different domains. Intuitively, this approach works based on two assumptions:

Assumption 1: Sentiment Consistency Abelson (1983); Liu (2012): a sentiment expression tends to be sentimentally coherent with its context. Notice that sentiment consistency can be reflected in multiple types of conjunction like “and”, “or”, etc., which will be explained in Section 4.2. In fact, this assumption is common in sentiment analysis and has been used in many studies Kanayama and Nasukawa (2006); Hassan and Radev (2010)

Assumption 2: The number of domain polarity-changed lexical words is much smaller than the number of those whose polarities do not change. This assumption ensures that we can rely on the general-purpose lexicon itself for detection. In other words, the real polarity of a sentiment word in a certain domain can be distilled by its connections with other (mostly polarity-unchanged) words whose polarities are known from the lexicon.

4.1 Sentiment Word Collection

As the first step, DSG collects sentiment words in an application domain corpus, including the sentiment words not in a sentiment lexicon. Specifically, we consider three types of (likely) sentiment words: (1) The word appears in a given lexicon. (2) The word is an adjective in the corpus. (3) The word is an adverb in the corpus and has an adjective form.

We simply accept all lexical words and adjective words as (likely) sentiment words, which does not cause serious problems in our experiments and they were also commonly used in the literature Liu (2012). However, we impose constraints on selecting adverbs. While adverbs like “quickly” and “nicely” do express sentiment, some others like “very” and “often” may not function the same. We thus use the adverbs having adjective forms only.

Notice that in the above setting, the sentiment words not in the lexicon are also collected due to two reasons: first, they are useful for building connection among other lexical words for inference purposes. Suppose that “quiet” is a sentiment word (found because it is an adjective) and it is not in the given lexicon. Given its sentiment correlations with other words like “efficient and quiet” and “quiet and quick”, it can make a path between “efficient” and “quick” in the graph. Second, in each domain there exist a number of sentiment words uncovered by the given lexicon. Their inferred polarities can also benefit the graph reasoning process, though those words are not the focus in this study (we aim at detecting the polarity change of lexical words). For instance, if the non-lexical word “quiet” is identified as showing positive sentiment, “efficient” and “quiet” are more likely to be positive, given their in-domain sentiment correlations. We follow Das and Chen (2007); Pang and Lee (2008) to handle the negation problem, where a negation word phrase like “not bad” will be treated as a single word like “not_bad” and its sentiment polarity will be reversed accordingly. Finally, all extracted words are modeled as nodes in the graph.

4.2 Sentiment Correlation Extraction

This step is to extract multiple types of conjunction relationship among sentiment words, which we refer to as sentiment correlation111The term sentiment correlation used in this paper denotes the correlation between two sentiment words in a domain, which may not have to be the same as used in other studies.. The key idea here is to use the sentiment consistency Abelson (1983); Liu (2012) (also see Assumption 1) for relationship construction among the collected sentiment words from the above step. Specifically, in an application domain, five (5) types of sentiment correlation are considered, each of which is presented in a triple format, denoted as (, correlation type, ). They will be used in the subsequent graph inference (discussed in the next sub-section). Their definitions are shown in Table 1.

Name Correlation Example Representation Agreement Level
AND connecting with “and” “it is efficient and quiet” (efficient, AND, quiet) Strongly Agree
OR connecting with “or” “everything as expected or better” (expected, OR, better) Agree
NB neighboring words “a reasonably quiet fridge” (reasonably, NB, quiet) Weakly Agree
ALT although, though “too noisy, though it is efficient” (noisy, ALT, efficient) Disagree
BUT but, however “it is a powerful but noisy machine” (powerful, BUT, noisy ) Strongly Disagree
Table 1: Five types of sentiment correlation.

In each sentence, when a specific type of relationship between two (collected) sentiment words is found, a triple is created. For instance, in the sentence “it is efficient and quiet”, a triple (efficient, AND, quiet) will be generated. The extraction of sentiment correlation is similar to . Likewise, a specific triple (powerful, BUT, noisy) will be extracted from the sentence “it is a powerful but noisy machine”. The extraction of (abbreviation for although) is similar to . means two neighboring sentiment words in a sentence, like “reasonably good”.

Notice that while five types of relationships are jointly considered, they are associated with different agreement levels (parameterized in the graphical model discussed below). Here the agreement level measures how likely the sentiment polarities of two connecting words are the same. Intuitively, we believe gives the highest level agreement. For instance, “bad and harmful” is very common but “good and harmful” is much unlikely. It is also an intuitive belief that indicates the strongest disagreement between two sentiment words. Note that we only consider pairwise relationships between sentiment words in this study, which already generate reasonably good results, as we will see.

4.3 Graph Construction and Inference

This subsection illustrates how our proposed domain-specific sentiment graph is constructed and used for word detection after the above two steps.

4.3.1 Constructing Markov Random Field

Markov Random Fields (MRFs) are a class of probabilistic graphical models that can deal with the inference problems with uncertainty in observed data. An MRF works on an undirected graph , which is constructed by a set of vertexes/nodes and edges/links and denoted as . In the graph , each node

denotes a random variable and each edge

denotes a statistical dependency between node and node . Formally, and are defined as two types of potential functions for encoding the observation (or prior) of a node and the dependency between two nodes. They are also called node potential and edge potential

respectively. An MRF thus can model a joint distribution for a set of random variables and its aim is to infer the marginal distribution for all

. With an inference method used, the estimation of the marginal distribution of a node can be obtained, which is also called belief.

The reason why we formulate our domain-specific sentiment graph as an MRF is three-fold: (1) The sentiment correlation between two words is a mutual relationship, as one word can provide useful sentiment information of the other word and vice versa, which can be properly formulated in an undirected graph. (2) From a probabilistic perspective, the polarity changes of sentiment words can be naturally understood as the belief estimation problem. That is, on one hand, we have an initial belief about the polarity of a lexical word (known from the lexicon, like the word “cold” is generally negative), which is essentially the prior. On the other hand, our goal is to infer the real polarity of a word in a specific application domain, which is reflected in its final estimated belief (like “cold” is positive in the domain fridge

). To be concrete, the polarity of a sentiment word is modeled as a 2-dimensional vector, standing for the probability distribution of positive and negative polarities, e.g.,

indicates that a word is very likely to express a positive sentiment in an application domain. We can further use as the parameter to simplify the representation as . (3) Recall that multiple types of sentiment correlation are used and treated differently in our proposed approach, these typed sentiment correlations can be encoded in the MRF model, which is further illustrated as follows.

4.3.2 Inference over Typed Correlation

As discussed above, the inference task in MRF is to compute the marginal distribution (or posterior probability) of each node given the node prior and edge potentials. Efficient algorithms for exact inference like Belief Propagation 

Pearl (1982) are available for certain graph topologies, but for general graphs involving cycles the exact inference is computationally intractable. Approximate inference is thus needed. Loopy Belief Propagation Murphy et al. (1999) is such an approximate solution using iterative message passing. A message from node to node is based on all message from other nodes to node except node itself. It works as:


where denotes the possible states of a node, i.e., being a sentiment word with positive or negative polarity. indicates that node is in a certain state. denotes the neighborhood of , i.e., the other nodes linking with node . is known as the message passing from node to node . is the normalization constant that makes message proportional to the likelihood of the node being in state , given the evidence from in its all possible states. After iterative message passing, the final belief is estimated as:


where is a normalization term that makes . In this case, can be viewed as the posterior probability of a sentiment word being with positive or negative polarity.

However, notice that in the above setting, each edge is not distinguishable in terms of its type of sentiment correlation. In other words, each type of possible connections between words is treated intrinsically the same, which does not meet with our modeling needs. In order to encode the typed sentiment correlation as defined in previous sections, we propose to replace the Eq. 2 with:


where indicates the specific type of sentiment correlation between node and node , which can be any type like or as defined in Section 4.2. thus becomes an edge potential function related to its sentiment correlation type. Each type of a correlation is parameterized as a (state) transition matrix shown in Table 2. The five types of sentiment correlation therefore result in five such tables but with different being set. For example, with can be set to 0.3 as it indicates the highest agreement level, while the one with can set to 0.1 as it is regarded as weakly agreement. For , can be set to -0.3 as it shows strong disagreement.

State Positive Negative
Table 2: Transition/Propagation matrix.

For each word, with its estimated beliefs obtained in the application domain, its polarity change score (pcs) is defined as:


where denotes the original sentiment polarity of a lexical word, and is the indicator function. According to the scores of all words, a word list ranked by is used to identify the most likely polarity-changed sentiment words, e.g., one can select the top words or set a threshold for word extraction. In this way, the sentiment words with changed polarities in each domain can be detected.

5 Experiments

We conducted experiments on multiple datasets with several candidate solutions. Here we first compare their performance on the word detection task. Sentiment classification will then be another task to evaluate their effect on polarity corrections.

5.1 Experimental Setup

Dataset. Four (4) real-world datasets from different domains/products were used, namely, fridge, blenders, washer, and movie. The first three datasets contain review sentences provided by a collaborating business company. The fourth dataset (movie) consists of tweets from Twitter discussing movies, which are collected by us. The first dataset contains 32,000 (32k) sentences, the second 16,000 (16k) sentences, and the rest datasets 10,000 (10k). Their data sizes can be viewed as large, medium and small. Such product diversity and variable size settings help evaluate the generality of each solution. Only the text is used by all candidate models.

In addition, two other datasets from domains drill and vacuum cleaner are used as development sets for parameter selections. drill contains 76k and vacuum cleaner contains 10k sentences.

Sentiment Lexicon. We used a general-purpose sentiment lexicon from Hu and Liu (2004), which contains 2,006 positive and 4,783 negative lexical words. A candidate model will find polarity-changed words from them for each domain.

Parameter Settings. The hyper-parameters of state priors and the (typed) transition matrices in DSG are shown in Table 3 and 4. They were empirically set based on the word detection performance on the two development datasets. We found this parameter setting works generally well on both datasets, while they are from different domains and with different data sizes. The following reported results for evaluations are based on this setting and as we will see, it already produces quite good results.

Prior Positive Non-Lexical Words Negative
0.70 0.50 0.30
Table 3: Parameters of state prior.
in 0.20 0.10 0.05 -0.10 -0.20
Table 4: Parameters of typed transition matrix.

5.2 Lexicon-based Sentiment Classifier

Our evaluations include lexicon-based sentiment classification. We briefly illustrate how a lexicon-based sentiment classifier (called classifier for short) works here. Clearly, it works with a lexicon, from which each word is associated with a sentiment score (e.g., -1/+1). The classifier then calculates the sentiment score of each sentence by summing the score of each word. We follow the lexicon-based classifier design in Taboada et al. (2011), incorporating sentiment negation and intensity. The sentence sentiment score is calculated:


Different lexicons working with this classifier will generate different results. That is, even if their lexical words are the same, the associated sentiment score of a lexical word could vary and Eq. 5 will thus make different predictions. This is how we can utilize the classifier to verify the effect of the word detection, because the classifier will perform differently using the original lexicon and the modified lexicon, whose results can be compared in a before-and-after manner. Here the modified lexicon means the sentiment polarities of (detected) lexical words are corrected from the original lexicon. For example, “crush” is associated with negative sentiment (-1) in the original lexicon, but it could be associated with positive sentiment (+1) in the modified lexicon (if detected), so the sentence-level sentiment scores will vary accordingly, e.g., “the machine does crush ice!” will be predicted as a positive sentence with the modified lexicon.

5.3 Candidate Models

Original Lexicon (OL): This is a baseline for sentiment classification evaluations only (Section 5.5), which uses the classifier with the original lexicon.
Domain-specific Sentiment Graph (DSG): This is our model. The following two models and it will be used for both word detection and classification.
Lexicon-Classifier Inconsistency (LCI)

: This is a heuristic solution to detecting the polarity-changed sentiment words. It relies on the inconsistency between the sentiment of a lexical word (obtained from the original lexicon) and the sentiment of the sentences containing the word (induced by the classifier). Concretely, it first calculates the sentiment polarities of all sentences using a classifier with the original lexicon. With the polarities of sentences known, it computes an inconsistency ratio for each lexical word. The inconsistency ratio is the ratio of (a) to (b), where (a) is the number of a word appearing in the positive/negative sentences but the word itself is negative/positive, and (b) is the number of all sentences covering that word. Finally, it ranks all lexical words based on their ratio values to produce a list of likely polarity-changed words.

SentProp (SP): SentProp Hamilton et al. (2016) is a lexicon construction algorithm considering the domain information, which is the most related work to ours. As discussed in Section 2, it is not directly applicable to the detection task. But since it can generate a list of domain-specific sentiment words and those words are associated with positive/negative scores (estimated by SentProp, which can be treated as the in-domain beliefs like DSG), we can design a two-step approach to achieve our goal. First, we download222 and run the SentProp system to learn the domain-specific lexicon for each domain. Second, we calculate the polarity change scores for all lexicon words like DSG based on the learned domain-specific sentiment scores and the original polarities from the lexicon using Eq. 4. Similar to DSG, it produces a list of words ranked by the polarity change scores. For its parameter selection, we tried both the system default, following the code instruction and the original paper, and also parameter fine-tuning based on the performance on two development sets (same as DSG), so as to achieve its best performance to report.

5.4 Correct Detection of Words

As each candidate model generates a list of words ranked by polarity-change scores, those top-ranked ones are the most likely polarity-changed words and can be used as the detected words. For evaluation, the top- words from each model are inspected and the number of correct (word) detection are counted, which is denoted as #C@n in Table 5.

Specifically, two domain experts who are familiar with the domain sentiments identify and annotate the correct polarity-changed words from the top-20 shortlisted candidate words generated by each model. For each candidate word, we sampled numbers of sentences containing that word for the domain experts to judge. A candidate word needs to be agreed by both of them to be correct. Here the Cohen’s Kappa agreement score is 0.817.

Model #C@n fridge blender washer movie
#C@5 5 5 4 5
DSG #C@10 9 10 6 9
#C@20 12 15 12 15
#C@5 3 3 3 1
LCI #C@10 5 3 5 4
#C@20 5 7 7 9
#C@5 1 0 1 1
SP #C@10 2 0 2 4
#C@20 3 3 3 6
Table 5: Detection of polarity-changed words.

Evaluation results are reported in Table 5, where we can see that DSG achieves outstanding results consistently. LCI also does a decent job, while SP does not perform well on this task.

Next, we will evaluate the impact of such detection from their top 20 words, and the following sub-sections are based on their correctly detected words to give further analyses.

5.5 Sentiment Classification

After the detection of polarity-changed words, we conduct classification tasks on the sentences containing at least one word from the detected words of all models. Because the classification results on the sentences that without containing any detected word would not be affected (same prediction results using either the original or modified lexicon).

For evaluation, we sampled and labeled 925 (around 1k) sentences, from all sentences that could be affected. We used stratified sampling strategy, and meanwhile, set minimum number of sentences contained by each word, to make sure each detected word is considered. The numbers of labeled sentences for the four domains are 232, 214, 174, and 305. The Cohen’s Kappa agreement score is 0.788.

In regard to the lexicon-based classification, for DSG and LCI, the modified lexicon for each domain is based on the correction of the original lexicon (OL) on that domain. For SP, its self-generated lexicon is used with its inferred sentiment scores.

Model fridge blender washer movie AVG
DSG 74.56% 80.84% 77.01% 84.91% 79.33%
SP 68.10% 78.97% 66.67% 87.87% 75.40 %
LCI 61.63% 68.22% 62.64% 62.95% 63.86%
OL 61.20% 65.42% 62.06% 56.72% 61.35%
Table 6: Sentiment classification accuracy.

Table 6 reports the classification accuracy, from which we have the following observations:

  1. Compared to the baseline using the original lexicon (OL), DSG greatly improves the accuracy by 17.98% in average. We can see the usefulness of detecting polarity change of lexical words for sentiment classification.

  2. SP also produces very good results. The reason is, as an essentially lexicon-generation approach, SP itself creates a bigger lexicon for each domain (around 2 times bigger than OL), including additional sentiment words outside the original lexicon. In other words, discovering more sentiment words (with more sentiment clues provided) could also help better classification. Note that this does not contradict the importance of detecting polarity-change words, as they are two different aspects. We will further discuss this shortly.

  3. LCI outperforms OL but its performance gain is low. The reason is, though LCI detects polarity changed words decently, its detected words affect a much smaller number of sentences compared to DSG’s and SP’s, i.e., the words LCI detects are rarer and less frequent, with fewer sentences being affected.

5.6 Example of Polarity Change in Domains

Here we show some example results. An underlined word indicates its polarity has changed in a certain domain. We found “good for crushing ice” and “this breaks down frozen fruit” in blender, “damn, I wanna watch it so bad!” and “this movie was insanely brilliant!” in movie. We also found “it keeps things cold and frozen” in fridge and “you can also delay your cycle” in washer.

5.7 Further Analysis and Discussion

We aim to answers two questions here. Q1: What is the key difference between using SP and DSG? Q2: More generally, what is the relationship between the existing lexicon generation research and this polarity-change detection problem?

First, let us deep dive a bit into SP. As a lexicon generation approach, its goal is to collect sentiment words from a given corpus and infer their sentiment scores. There are two important notes: (a) while SP could discover more sentiment words, those extracted words could be wrong. For example, SP extracts the word “product” as a sentiment word and assigns it a positive (+ve) sentiment. This could lead to mis-classifications of the negative (-ve) sentences containing “product”. (b) while SP directly discovers and estimates sentiment words, it does not know which sentiment words carry domain-oriented important information. For example, SP discovers “excellent”, “crush”, “terrible” for the domain blender and estimates the sentiment scores as 0.9, 0.7, and 0.1 (for simplicity, let us assume all scores are rescaled to [0.0, 1.0] , where 1.0 denotes most +ve and 0.0 most -ve). Those scores indicate their polarities, but do not reflect their importance/effect of polarity change towards a domain.

For DSG, (a) could be avoided because “product” is usually excluded in a general-purpose lexicon. Regarding (b), say the scores of “excellent”, “crush” and “bad” are 1.0, 0.0, and 0.0 in the original lexicon, with the domain sentiment re-estimation from DSG they become 0.9, 0.7, and 0.1. Their polarity-changed scores are thus inferred as 0.1 (), 0.7, and 0.1, where “crush” as an important domain-sentiment changed word (0.7) can be found.

Certainly, one can compare the SP generated lexicon to the original/general lexicon. We already did this for the detection task (Table 5). Here we design a variant SP-dsg-like, following this idea for the classification task. The main difference between SP and SP-dsg-like is that SP directly uses its generated lexicon and sentiment scores, while SP-dsg-like uses its generated lexicon to modify the original lexicon (OL) like DSG does. However, SP-dsg-like performs poorly (Table 7), mainly because the modified lexicon (based on OL) does not fully reflect the whole sentiment words generated by SP.

Model fridge blender washer movie AVG
OL 61.20% 65.42% 62.06% 56.72% 61.35%
SP 68.10% 78.97% 66.67% 87.87% 75.40%

67.67% 71.02% 64.36% 63.93% 66.75%
SP-dsg-like +SP 69.40% 77.57% 68.97% 83.28% 74.81%
OL + SP 62.07% 78.04% 70.11% 82.30% 73.13%
CLI + SP 62.07% 77.57% 70.67% 83.61% 73.48%
DSG + SP 72.41% 78.97% 79.89% 88.85% 80.03%
Table 7: Sentiment classification accuracy.

We then combine two lexicons together to give another variant SP-dsg-like+PS, which means the modified lexicon (based on OL) is expanded by the SP self-generated lexicon, where the SP generated lexicon can contain additional sentiment words (outside OL). Similarly, we can make OL+SP and CLI+SP, but they are all inferior to SP (Table 7). The reason is that the key polarity-changed words in the original lexicon have not been corrected, keeping wrong sentiments for classification tasks.

However, noticing DSG can effectively detect and correct those words, when we use DSG+SP, the overall results are improved (AVG) and even better than using DSG or SP only (Table 7 and Table 6).

It has been demonstrated that either more sentiment words (from PS) or fixing a small number of important polarity-changed words (from DSG) can help sentiment classification. With DSG+PS working better, we can view the lexicon generation and domain polarity word detection as two directions for classification improvement. Either one has its own advantage. The lexicon generation approaches can induce more words and may help find rarer/infrequent words. The word detection can be handy and less risky, as it simply corrects the polarities of some important lexical words and would not induce noises (wrong sentiment words).

Finally, the answers to Q1 and Q2 are: using SP/lexicon-generation and DSG/polarity-change-detection can both improve sentiment classification, but in different manners. Using DSG can also effectively detect important polarity-change words, while SP does not perform very well on this task. These two directions could be complementary, as shown by DSG+SP. It is our hope that this work can inspire further relevant research in the future.

6 Conclusion

This paper studied the problem of detecting domain polarity-changed words in a sentiment lexicon. As we have seen, the wrong polarities seriously degenerate the sentiment classification performance. To address it, this paper proposed a novel solution named Domain-specific Sentiment Graph (DSG). Experimental results demonstrated its effectiveness in finding the polarity-changed words and its resulting performance gain in sentiment classification.


  • R. P. Abelson (1983) Whatever became of consistency theory?. Personality and Social Psychology Bulletin 9 (1), pp. 37–54. Cited by: §4.2, §4.
  • H. S. Bhatt, D. Semwal, and S. Roy (2015) An iterative similarity based adaptation technique for cross-domain text classification. In Proceedings of the Nineteenth Conference on Computational Natural Language Learning, pp. 52–61. Cited by: §2.
  • S. Blair-Goldensohn, K. Hannan, R. McDonald, T. Neylon, G. A. Reis, and J. Reynar (2008) Building a sentiment summarizer for local service reviews. Conference Proceedings In Proceedings of WWW-2008 workshop on NLP in the Information Explosion Era, Cited by: §2.
  • Y. Choi and C. Cardie (2008) Learning with compositional semantics as structural inference for subsentential sentiment analysis. Conference Proceedings In

    Proceedings of Conference on Empirical Methods in Natural Language Processing (EMNLP-2008)

    pp. 793–801. Cited by: §2.
  • Y. Choi and C. Cardie (2009)

    Adapting a polarity lexicon using integer linear programming for domain-specific sentiment classification

    Conference Proceedings In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing (EMNLP-2009), pp. 590–598. Cited by: §2.
  • S. R. Das and M. Y. Chen (2007) Yahoo! for amazon: sentiment extraction from small talk on the web. Management science 53 (9), pp. 1375–1388. Cited by: §4.1.
  • X. Ding, B. Liu, and P. S. Yu (2008) A holistic lexicon-based approach to opinion mining. Conference Proceedings In Proceedings of the Conference on Web Search and Web Data Mining (WSDM-2008), pp. 231–240. Cited by: §2.
  • E. C. Dragut, C. Yu, P. Sistla, and W. Meng (2010) Construction of a sentimental word dictionary. Conference Proceedings In Proceedings of ACM International Conference on Information and Knowledge Management (CIKM-2010), pp. 1761–1764. Cited by: §2.
  • W. Du, S. Tan, X. Cheng, and X. Yun (2010) Adapting information bottleneck method for automatic construction of domain-oriented sentiment lexicon. Conference Proceedings In Proceedings of ACM International Confernece on Web search and data mining (WSDM-2010), pp. 111–120. Cited by: §2.
  • A. Esuli and F. Sebastiani (2005) Determining the semantic orientation of terms through gloss classification. Conference Proceedings In Proceedings of ACM International Conference on Information and Knowledge Management (CIKM-2005), pp. 617–624. External Links: ISBN 1595931406 Cited by: §2.
  • L. Gatti and M. Guerini (2012) Assessing sentiment strength in words prior polarities. Conference Proceedings In Proceedings of the Conference of the 24th International Conference on Computational Linguistics (COLING-2012), Cited by: §2.
  • W. L. Hamilton, K. Clark, J. Leskovec, and D. Jurafsky (2016) Inducing domain-specific sentiment lexicons from unlabeled corpora. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Conference on Empirical Methods in Natural Language Processing, Vol. 2016, pp. 595. Cited by: §2, §5.3.
  • A. Hassan and D. Radev (2010) Identifying text polarity using random walks. Conference Proceedings In Proceedings of Annual Meeting of the Association for Computational Linguistics (ACL-2010), pp. 395–403. Cited by: §2, §4.
  • V. Hatzivassiloglou and K. R. McKeown (1997) Predicting the semantic orientation of adjectives. Conference Proceedings In Proceedings of Annual Meeting of the Association for Computational Linguistics (ACL-1997), pp. 174–181. Cited by: §2.
  • M. Hu and B. Liu (2004) Mining and summarizing customer reviews. Conference Proceedings In Proceedings of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2004), pp. 168–177. External Links: ISBN 1581138881 Cited by: §1, §2, §5.1.
  • S. P. Igo and E. Riloff (2009) Corpus-based semantic lexicon induction with web-based corroboration. In Proceedings of the Workshop on Unsupervised and Minimally Supervised Learning of Lexical Semantics, pp. 18–26. Cited by: §2.
  • V. Jijkoun, M. d. Rijke, and W. Weerkamp (2010) Generating focused topic-specific sentiment lexicons. Conference Proceedings In Proceedings of Annual Meeting of the Association for Computational Linguistics (ACL-2010), Cited by: §2.
  • N. Kaji and M. Kitsuregawa (2006) Automatic construction of polarity-tagged corpus from html documents. Conference Proceedings In Proceedings of COLING/ACL 2006 Main Conference Poster Sessions (COLING-ACL-2006), pp. 452–459. Cited by: §2.
  • J. Kamps, M. Marx, R. J. Mokken, and M. De Rijke (2004) Using wordnet to measure semantic orientation of adjectives. Conference Proceedings In Proc. of LREC-2004, Vol. 4, pp. 1115–1118. Cited by: §2.
  • H. Kanayama and T. Nasukawa (2006) Fully automatic lexicon expansion for domain-oriented sentiment analysis. Conference Proceedings In Proceedings of Conference on Empirical Methods in Natural Language Processing (EMNLP-2006), pp. 355–363. External Links: ISBN 1932432736 Cited by: §2, §4.
  • W. Kessler and H. Schütze (2012) Classification of inconsistent sentiment words using syntactic constructions. Conference Proceedings In Proceedings of the Conference of the 24th International Conference on Computational Linguistics (COLING-2012), Cited by: §2.
  • S. Kim and E. Hovy (2004) Determining the sentiment of opinions. Conference Proceedings In Proceedings of Interntional Conference on Computational Linguistics (COLING-2004), pp. 1367. Cited by: §2.
  • X. Li, L. Bing, W. Lam, and B. Shi (2018a) Transformation networks for target-oriented sentiment classification. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Vol. 1, pp. 946–956. Cited by: §2.
  • Z. Li, Y. Wei, Y. Zhang, and Q. Yang (2018b) Hierarchical attention transfer network for cross-domain sentiment classification. In

    Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, AAAI 2018, New Orleans, Lousiana, USA, February 2-7, 2018

    Cited by: §1, §2.
  • B. Liu (2012) Sentiment analysis and opinion mining. Synthesis lectures on human language technologies 5 (1), pp. 1–167. Cited by: §1, §1, §4.1, §4.2, §4.
  • Y. Lu, M. Castellanos, U. Dayal, and C. Zhai (2011) Automatic construction of a context-aware sentiment lexicon: an optimization approach. Conference Proceedings In Proceedings of the 20th international conference on World wide web (WWW-2011), pp. 347–356. Cited by: §2.
  • S. Mohammad, C. Dunne, and B. Dorr (2009) Generating high-coverage semantic orientation lexicons from overtly marked words and a thesaurus. Conference Proceedings In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing (EMNLP-2009), pp. 599–608. Cited by: §2.
  • K. P. Murphy, Y. Weiss, and M. I. Jordan (1999) Loopy belief propagation for approximate inference: an empirical study. In Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence, pp. 467–475. Cited by: §4.3.2.
  • B. Pang and L. Lee (2008) Using very simple statistics for review search: an exploration. In Proceedings of International Conference on Computational Linguistics (COLING-2008), Cited by: §1, §4.1.
  • J. Pearl (1982) Reverend bayes on inference engines: a distributed hierarchical approach. Cognitive Systems Laboratory, School of Engineering and Applied Science, University of California, Los Angeles. Cited by: §4.3.2.
  • W. Peng and D. H. Park (2011) Generate adjective sentiment dictionary for social media sentiment analysis using constrained nonnegative matrix factorization. Conference Proceedings In Proceedings of the Fifth International AAAI Conference on Weblogs and Social Media (ICWSM-2011), Vol. 51, pp. 61801. Cited by: §2.
  • G. Qiu, B. Liu, J. Bu, and C. Chen (2011) Opinion word expansion and target extraction through double propagation. Computational Linguistics 37 (1), pp. 9–27. Cited by: §2.
  • D. Rao and D. Ravichandran (2009) Semi-supervised polarity lexicon induction. Conference Proceedings In Proceedings of the 12th Conference of the European Chapter of the ACL (EACL-2009), pp. 675–682. Cited by: §2.
  • S. Rothe, S. Ebert, and H. Schütze (2016) Ultradense word embeddings by orthogonal transformation. arXiv preprint arXiv:1602.07572. Cited by: §2.
  • I. San Vicente, R. Agerri, and G. Rigau (2014) Simple, robust and (almost) unsupervised generation of polarity lexicons for multiple languages. In Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics, pp. 88–97. Cited by: §2.
  • M. Taboada, J. Brooke, M. Tofiloski, K. Voll, and M. Stede (2011) Lexicon-based methods for sentiment analysis. Computational linguistics 37 (2), pp. 267–307. Cited by: §1, §5.2.
  • H. Takamura, T. Inui, and M. Okumura (2007) Extracting semantic orientations of phrases from dictionary. Conference Proceedings In Proceedings of the Joint Human Language Technology/North American Chapter of the ACL Conference (HLT-NAACL-2007), Cited by: §2.
  • Z. Teng, D. T. Vo, and Y. Zhang (2016) Context-sensitive lexicon features for neural sentiment analysis. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pp. 1629–1638. Cited by: §2.
  • P. D. Turney and M. L. Littman (2003) Measuring praise and criticism: inference of semantic orientation from association. ACM Transactions on Information Systems. Cited by: §2.
  • A. Valitutti, C. Strapparava, and O. Stock (2004) Developing affective lexical resources. PsychNology Journal 2 (1), pp. 61–83. Cited by: §2.
  • L. Velikovich, S. Blair-Goldensohn, K. Hannan, and R. McDonald (2010) The viability of web-derived polarity lexicons. Conference Proceedings In Proceedings of Annual Conference of the North American Chapter of the Association for Computational Linguistics (HAACL-2010), pp. 777–785. External Links: ISBN 1932432655 Cited by: §2.
  • S. Volkova, T. Wilson, and D. Yarowsky (2013) Exploring sentiment in social media: bootstrapping subjectivity clues from multilingual twitter streams. Conference Proceedings In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL-2013), Cited by: §2.
  • B. Wang and H. Wang (2008) Bootstrapping both product features and opinion words from chinese customer reviews with cross-inducing. Conference Proceedings In Proceedings of the International Joint Conference on Natural Language Processing (IJCNLP-2008), Cited by: §2.
  • S. Wang, S. Mazumder, B. Liu, M. Zhou, and Y. Chang (2018) Target-sensitive memory networks for aspect sentiment classification. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Vol. 1, pp. 957–967. Cited by: §2.
  • Y. Wang, Y. Zhang, and B. Liu (2017) Sentiment lexicon expansion based on neural pu learning, double dictionary lookup, and polarity association. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pp. 553–563. Cited by: §2.
  • T. Wilson, J. Wiebe, and P. Hoffmann (2005) Recognizing contextual polarity in phrase-level sentiment analysis. Conference Proceedings In Proceedings of the Human Language Technology Conference and the Conference on Empirical Methods in Natural Language Processing (HLT/EMNLP-2005), pp. 347–354. Cited by: §2.
  • Y. Wu and M. Wen (2010) Disambiguating dynamic sentiment ambiguous adjectives. Conference Proceedings In Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010), pp. 1191–1199. Cited by: §2.
  • G. Xu, X. Meng, and H. Wang (2010) Build chinese emotion lexicons using a graph-based algorithm and multiple resources. In Proceedings of the 23rd international conference on computational linguistics, pp. 1209–1217. Cited by: §2.
  • M. Yang, B. Peng, Z. Chen, D. Zhu, and K. Chow (2014) A topic model for building fine-grained domain-specific emotion lexicon. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Short Papers), Cited by: §2.
  • J. Yu and J. Jiang (2016) Learning sentence embeddings with auxiliary tasks for cross-domain sentiment classification. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pp. 236–246. Cited by: §2.
  • Y. Zhao, B. Qin, and T. Liu (2012) Collocation polarity disambiguation using web-based pseudo contexts. Conference Proceedings In Proceedings of the 2012 Conference on Empirical Methods on Natural Language Processing (EMNLP-2012), Cited by: §2.
  • L. Zhuang, F. Jing, and X. Zhu (2006) Movie review mining and summarization. Conference Proceedings In Proceedings of ACM International Conference on Information and Knowledge Management (CIKM-2006), pp. 43–50. External Links: ISBN 1595934332 Cited by: §2.