Over the past few years, microblogs have become one of the most popular online social networks. Twitter and Weibo are two of the most representative microblog platforms. With more than 400 million tweets per day on Twitter and more than 66 million daily active users on Weibo, microblog users generate large amount of tweets which cover rich topics including political issues, celebrity gossip, or personal life. Because the user generated content (UGC) on microblogs covers rich topics and is real-time, mining and analyzing this information is beneficial to both industrial community and academic community. Since human inspection of this vast stream of real-time data is expensive and time-consuming, automatic computational tools and mining methods are thus in huge demand. Since tweets are different from conventional text and they generally are of limited length and contain informal, irregular or new words, it is difficult to exploit user intention to publish a tweet and attitude towards certain topic only from the analysis on topic and sentiment level. Recently, a variety of studies have been done on Twitter including event detection, user recommendation and tweet classification. Tweet classification attracts considerable attention since it is very important to analyze, understand and predict user behaviors on social networks.
Most of the existing work focuses on either tweet purpose classification or position classification. For purpose classification, previous work [naaman2010really, alhadi2011exploring] has classified tweets into different classes of purpose, e.g., social interaction with people, promotion or marketing, information sharing, etc. For position classification [saif2012alleviating, go2009twitter, kouloumpis2011twitter], tweets are classified into positive, negative and neutral. However, dealing with tweet purpose and position separately in previous work has two limitations. First, in order to determine the purpose and position of a tweet, two different classifiers should be trained and this is inefficient. Second, the correlation between the tweet purpose and the position has not been exploited. For example, given the topic Obama care, based on the data set introduced in Table 3, when people try to share information, they tend to be positive to this policy, while users are more likely to oppose it when they interact with others. If these correlations are captured, it will be beneficial for tweet position and purpose classification.
In order to overcome these limitations, in this study we aim to identify tweet purpose and position simultaneously by exploiting the correlation between purpose and position in tweets. Tweet purpose indicates user’s intention in publishing a tweet, such as sharing information or expressing personal emotion. Tweet position indicates whether user will support, oppose or be neutral to a given topic. We transform this problem to identify tweet purpose and position simultaneously into a multi-label classification problem. Our method is advantageous in two aspects: () It is more efficient to use multi-label classification methods to simultaneously identify tweet purpose and position since only one unified classifier needs to be trained. () The correlation between tweet purpose and position can be captured by multi-label classification methods to improve the accuracy for classification. Besides, aiming to tackle the issue that some tweets in the data are predicted to contain no labels or multiple labels using multi-label classification method, two different post-processing strategies have been proposed. In order to validate the effectiveness of this problem transformation and post-processing strategies, we build two data sets collected from Twitter and experiments are conducted on the data sets.
In short, this paper makes the following contributions:
We define the task to identify tweet purpose and position simultaneously and transform this problem to a multi-label classification problem.
We propose two post-processing strategies i.e., summation and weighted summation, for the classification task and by incorporating the strategies into the multi-label classification method, the classification performance can be improved.
We test our approach on two real-world data sets to validate the classification method with post-processing and the results demonstrate the effectiveness of the problem transformation and post-processing strategies.
The rest of this paper is organized as follows. Section 2 reviews the related work. Section 3 presents the collection and annotation of data sets. In Section 4, the multi-label classification method and the post-processing strategies are introduced. Section 5 presents the experiments and finally we conclude this study in Section 6.
2 Related Work
In this section, we review the related work in two areas: sentiment analysis on microblog and tweet purpose identification.
Sentiment Analysis on Microblog
Sentiment analysis has been playing a crucial role in natural language processing and text mining recently and it aims to analyze people’s opinions, sentiments, evaluations, appraisals, attitudes, and emotions towards entities such as products, services, organizations, individuals, issues, events, topics, and their attributes [liu2012sentiment]. Most of existing work aims at sentiment polarity classification, i.e., classifying opinion into categories (positive and negative) or categories (positive, negative and neural) and in general, they conducted experiments on the movie or product review data.
With the popularity of microblog, some sentiment analysis work on microblog platform has been done. Due to the short length of tweets, the use of informal and irregular words, and the rapid evolution of language [saif2012alleviating] on the Internet, sentiment analysis on microblog is more challenging than conventional sentiment analysis on reviews or weblog posts. Go et al. [go2009twitter]
applied a distant supervised learning method to classify tweet sentiment, and emoticons have been used as noisy labels for training data. Kouloumpis et al.[kouloumpis2011twitter] exploited hashtags in tweets to build training data. Their experiments demonstrated the part-of-speech features may not be useful in classifying sentiment on tweets. Apart from supervised methods, unsupervised methods have also been used in sentiment analysis on tweets. He et al. [he2012quantising] proposed a probabilistic generative model for joint modeling sentiment labels and topics for tweets. Recently, several approaches involving natural language processing [iyer2019event, iyer2019unsupervised, iyer2019heterogeneous, iyer2017detecting, iyer2019machine, iyer2017recomob], machine learning [li2016joint, iyer2016content, honke2018photorealistic]iyer2018transparency, li2018object] and numerical optimizations [radhakrishnan2016multiple, iyer2012optimal, qian2014parallel, gupta2016analysis, radhakrishnan2018new] have also been used in the visual and language domains.
Tweet Purpose Identification
Users’ intentions for using microblogs have been widely studied recently [naaman2010really, alhadi2011exploring, mohammad2013identifying]. Most studies consider tweet purpose identification as a classification task. Naanman et al. [naaman2010really], from the perspective of characteristics of social activity and communication patterns on Twitter, categorized tweets into types: information sharing, self promotion, opinions/complaints, statements and random thoughts, me now, question to followers, presence maintenance, anecdote (me) and anecdote (others). Alhadi et al. [alhadi2011exploring] organized tweets purpose to taxonomy which includes social interaction with people, promotion or marketing, sharing of resources, giving/requiring feedback, broadcast alert/urgent information, requiring/raising funding, recruitment of worker, and expression of emotions. Mohammad et al. [mohammad2013identifying] studied tweet purpose on the electoral topic. They firstly organized these political tweets to types of purposes (favor, oppose, and other) and furthermore classified these categorizes into sub-categorizes according to the emotion degrees.
Some studies are more related to our work. In [huang2013sentiment], both sentiment and topic for tweets have been modeled using a unified framework. However, this work is different from ours because it has not explored the purpose and position for tweets and the classifiers used for sentiment and topic are trained separately using different features (although the title of this paper used the keywords “multi-label classification approach”), while in our study, only one multi-label classifier will be trained and purpose and position labels can be obtained simultaneously.
3 Data Set
In this section, the data collection and the label annotation rules for purpose and position are introduced.
In order to build the Twitter data set, we collected the tweets in two topics, i.e., Obama care and death penalty. We used Twitter Search API111https://dev.twitter.com/docs/using-search with the queries Obama care and death penalty. Then we pre-processed these tweets by removing () non-English tweets, () tweets less than words, and () duplicated tweets. After removing the irrelevant tweets to these two topics, we labeled tweets for each of the two topics. The statistics of two data sets are shown in Tables 3 and 4 and the details for annotation are introduced in the following subsection.
In Section 2, some studies on tweet purpose classification have been reviewed. Based on previous studies and characteristics of our data sets, we organize tweets into categorizes: () Express emotion/personal interests; () Information sharing; and () Social interaction. Some example tweets of different purpose labels are shown in Table 1.
Position labels are based on the position in the tweet towards the given topic, i.e., Obama care and death penalty, and consist of three types of labels: pro, con and neutral. Some examples of position labels are shown in Table 2.
|Purpose Label||Example of Tweet|
|Express emotion||I looked at Obamacare and said, “Yeah. And??” I’m not alone in thinking it was a mistake to support and help re-elect him. Embarrassing.|
|Information sharing||Jimmy Kimmel found that people support the Affordable Care Act much more than Obamacare http://t.co/eTX46m9ZVi PoliticalNews|
|Social interaction||DamianBennett SheilaKihne oh, I’m sorry, did Obamacare pass with unanimous support from Republicans? Or the opposite of that?|
|Position Label||Example of Tweet|
|Pro||Should bring the death penalty back! executed|
|Con||The death penalty is pure violence, a barbaric and useless violence.|
|Neutral||I have such mixed feelings on the death penalty, some people deserve it but then some people don’t executed|
4 Multi-Label Classification with Post-Processing
Random k-Labelsets for Multi-Label Classification
Different from traditional single-label classification task in which every instance is associated with only one single label, in multi-label classification, the instances are associated with a set of labels. In many application domains, multi-label cases are more common, for example, the movie The Lord of the Rings can be classified into categories action, adventure and fantasy222http://www.imdb.com/title/tt0120737/. Therefore, multi-label classification has been a hot topic recently. In general, there are two types of multi-label classification methods: problem transformation and algorithm adaptation [tsoumakas2007multi, tsoumakas2010random]. In problem transformation, methods transform the multi-label classification task into one or more single-label classification or ranking tasks. In algorithm adaptation, methods are extended in order to handle multi-label tasks directly. In this paper, we will use the method belonging to problem transformation.
Label powerset (LP) method [boutell2004learning] is a simple but effective multi-label learning method which considers each unique set of labels that exists in the training set as one of the classes of a new single-label classification task and then the multi-label classification problem can be transformed into several single-label classification problems.
RAkEL (Random k-Labelsets) multi-label classification method [tsoumakas2010random] is based on LP. RAkEL solves the problems in label powerset (LP) method [boutell2004learning] that the large number of labelsets when the number of labels is large and the inability to predict labelsets not observed in the training set while keeping the advantage of capturing label correlations. The RAkEL method breaks a large set of labels into a number of small-sized labelsets randomly, and for each of the labelsets, a multi-label classifier will be trained using LP method.
For an unlabeled instance, the final decision is based on the combination of the results generated by all LP classifiers using the majority voting rule. In RAkEL, the size of labelset and the number of models can be specified. To utilize this model in our study, an example is shown in Table 5 with and . denotes the purpose label and denotes the position label. If the instance is predicted to be assigned this label, the value is , otherwise the value is . In this example, the threshold for the final prediction is . The average vote is obtained by dividing the times of the label being predicted to be by the total number of this label being predicted. Therefore, for and , both of the average votes, and respectively, exceed the threshold so the final prediction is and .
Post-processing Strategies for RAkEL
In our application, it is assumed that each tweet has only one purpose label and one position label. However, multi-label classification methods consider all the labels equally. Therefore, some tweets may be predicted to contain no label or multiple labels for purpose or position if RAkEL method is used directly in our study. In order to deal with this problem, we propose the post-processing strategies for RAkEL method. For the tweets assigned no label or multiple labels for purpose and/or position, we will find K tweets from the training set which are most similar to the original tweet and use the labels from these K tweets to make new prediction. Two strategies can be used to make new prediction: () summation strategy; and () weighted summation strategy.
An example for the post-processing strategies is shown in Figure 1. The original prediction for a tweet contains no labels, so we find most similar tweets for this tweet and the similarity between each neighbor and the original tweet are , , , and . In summation strategy, we sum up all the labels from neighbors and choose the index for the largest one as the new prediction for the original tweets, i.e., the third one with value in the figure. In weighted summation strategy, the label from each neighbor is first multiplied by the corresponding similarity value and then we sum them up to select the index for the largest one, i.e., the first one with value in the figure.
In order to find the most similar tweets, a similarity metric is required. In this study, we use the widely used cosine similarity metric which is a measure of similarity between two vectors of an inner product space that measures the cosine of the angle between them. Given feature vectorand , the cosine similarity is calculated as follows:
Combing RAkEL method and the post-processing strategies, the complete process is formed. It consists of three steps, i.e., training, testing and post-processing and the algorithm is presented in detail in Algorithm 37.
In the experiments, we randomly choose tweets as the training set and the rest tweets as the test set for each data set. And we compare different methods in the experimental study including:
KNN: Since our proposed post-processing strategies are based on KNN model, we use KNN as one of the baselines in the comparison.
SVM: Stated in Section 4, the RAkEL method is based on LP method and LP method will use single-label classifers to make predictions. In the experiments, SVM is applied as the single-label classifier, so SVM is used as another baseline.
RAkEL: The introduction of RAkEL is presented in Section 4.
RAkELsum: RAkEL with summation strategy.
RAkELwsum: RAkEL with weighted summation strategy.
In order to classify tweets, each tweet in the data set is represent as a vector of features and some commonly used text classification features are employed in the experiments including n-grams, punctuation, part-of-speech and Twitter-specific features. The details of features are shown below.
-: We use the presence of n-gram, including - and - in the experiments, as the features, i.e., the value of this feature is or . To reduce the dimensionality of - features, we remove the - occurring in the data set less than times and the - occurring in the data set less than times.
punctuation: The number of occurrences of exclamation marks, question marks and colons.
POS (part-of-speech): The number of occurrences of each POS tagger is used as the feature. The Tweet NLP and POS Tagging tool333http://www.ark.cs.cmu.edu/TweetNLP/ [owoputi2013improved] is used to extract POS features for each tweet.
Twitter-specific features: several typical Twitter-specific features are utilized including:
the number of hashtags, i.e., the symbol;
the number of mentioning users, i.e., the symbol;
the present of retweet, i.e., the symbol;
the number of hyperlinks including URLs and e-mail addresses.
In the following experiments, we use -, - and +- to denote unigram, bigram and a combination of unigram and bigram features respectively. We also combine punctuation features and Twitter-specific features as the statistical features and use STAT to denote this combination. POS is used to denote all the POS features.
|Obama care data set||Death Penalty data set|
Different evaluation metrics have been used in multi-label classification[tsoumakas2007multi]. In this study, we apply Hamming loss to evaluate the performance. Hamming loss, based on Hamming distance, takes into account the prediction error (an incorrect label is predicted) and the missing error (a relevant label not predicted), normalized over total number of classes and total number of examples [sorower2010literature]. The Hamming loss is defined as follows:
where is the number of examples in the test data and is the number of labels. and denote the sets of true and predicted labels for instance respectively. stands for the symmetric difference of two sets and corresponds to the exclusive OR (XOR) operation in Boolean logic [tsoumakas2007multi]. Intuitively, the performance is better, when the Hamming Loss is smaller. would be the ideal case indicating that there is no error in the prediction.
Since multi-label classification method is employed in the experiments, the Hamming loss is applied as the evaluation measure. However, the single-label classifiers used in the experiments like KNN and SVM cannot be evaluated directly using Hamming loss. Therefore, the purpose labels and position labels generated by two individual classifiers are combined and the combined labels are in the same form of results generated by multi-label classifiers.
To validate the effectiveness of the multi-label classification method and the post-processing strategies in this application, the SVM method are used as the baselines. Two SVM classifiers are trained for purpose and position respectively and LIBSVM tools444http://www.csie.ntu.edu.tw/~cjlin/libsvm/ are used for training the SVM classifiers. Moreover, our proposed post-processing strategies are based on KNN method, so we also report the results generated by KNN in the experiments (the parameter K is set to be 10). For the multi-label classification, we use the implementation of RAkEL in Mulan555http://mulan.sourceforge.net/. The results of the experiments are shown in Table 6. In the table, , , , , and denote features -, -, +-, +- + and +- + + , respectively. In these results, the number of similar neighbors K is also set to be 10 and the influence of this parameter will be discussed in the next section.
Some conclusions can be drawn from the results reported above.
Multi-label classification method RAkEL, no matter with or without post-processing, can perform better than single-label classifiers, i.e., KNN and SVM in this study on both data sets. However, due to the different characteristics of different data sets, the scales for the Hamming loss are different. Among all the methods, RAkEL with weighted summation strategy performs best. For example, RAkEL+wsum is improved % compared with KNN on Obama care data set.
RAkEL with post-processing using either summation strategy or weighted summation strategy can generate better results than the original RAkEL method, which validates the effectiveness of our proposed post-processing strategies. And weighted summation strategy is more effective on both data sets. For example, RAkEL+wsum and RAkEL+sum can get % and % improvement on Obama care data set, respectively.
In all the results, we can observe that using the combination of +-, and features performs best. From - features to +-, and features, the more features are introduced, the better performance can be achieved generally. For example, compare the performance of feature and , the improvement is % on Obama care data set.
Sensitivity Analysis for the Postprocessing Stage
In the postprocessing strategies, there is one parameter, i.e., the number of most similar neighbor K, which may influence the performance. Therefore, in this section, we study the influence of different values of K. Considering the combination of +-, and features can perform best in above experiments, we only study the influence of K using this combination of features on Death Penalty data set.
Using RAkEL+sum strategy and RAkEL+wsum strategy, the performance for different K on +-, and features are presented in Figure 2 and Figure 3. K is set to range from to with the interval . The results show that the best Ks for different post-processing strategies are different. For purpose classification, the optimal K is around for RAkEL+sum strategy. But for RAkEL+wsum strategy, larger K (around -) are preferred. However, since the differences in the results are all within , the results are not very sensitive to the choice of K.
6 Conclusion and Future Work
Analyzing purpose and position on tweets is beneficial for many areas. In this paper, we study the problem to identify tweet purpose and position simultaneously. We first transform this problem to a multi-label classification problem to capture the label correlations and then propose the post-processing strategies for a multi-label classification method RAkEL to classify tweets. To validate the effectiveness of our work, we build two data sets from Twitter related to the topic Obama care and Death Penalty. The experiments have been conducted on this data set and our results show that the proposed method outperforms the baseline method on accuracy. Furthermore, the influence of the parameters in the post-processing strategies has been studied.
In the future, we will further study this problem in two aspects. First, we will explore more features such as introducing negated context information and emotion lexicon. Secondly, we will integrate the specific constraints, i.e., each tweet can contain only one purpose label and one position label, in classification into the objective function in multi-label classification to form a unified framework for the classification task.