An Iterative Approach for Identifying Complaint Based Tweets in Social Media Platforms

01/24/2020 ∙ by Gyanesh Anand, et al. ∙ University of Maryland Bloomberg 0

Twitter is a social media platform where users express opinions over a variety of issues. Posts offering grievances or complaints can be utilized by private/ public organizations to improve their service and promptly gauge a low-cost assessment. In this paper, we propose an iterative methodology which aims to identify complaint based posts pertaining to the transport domain. We perform comprehensive evaluations along with releasing a novel dataset for the research purposes.



There are no comments yet.


page 1

page 2

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.


With the advent of social media platforms, increasing user base address their grievances over these platforms, in the form of complaints. According to [3], complaint is considered to be a basic speech act used to express negative mismatch between the expectation and reality. Transportation and its related logistics industries are the backbones of every economy222 Many transport organizations rely on complaints gathered via these platforms to improve their services, hence understanding these are important for: (1) linguists to identify human expressions of criticism and (2) organizations to improve their query response time and address concerns effectively.

Presence of inevitable noise, sparse content along with rephrased and structurally morphed instances of posts, make the task at hand difficult [4]. Previous works [2] in the domain of complaint extraction have focused on static datasets only. These are not robust to changes in the trends reflected, information flow and linguistic variations. We propose an iterative, semi-supervised approach for identification of complaint based tweets, having the ability to be replicated for stream of information flow. The preference of a semi-supervised approach over supervised ones is due to the stated reasons: (a) the task of isolating the training set, make supervised tasks less attractive and impractical and (b) imbalance between the subjective and objective classes lead to poor performance.

Proposed Methodology

We aimed to mimic the presence of sparse/noisy content distribution, mandating the need to curate a novel dataset via specific lexicons. We scraped

random posts from recognized transport forum333 A pool of uni/bi-grams was created based on tf-idf representations, extracted from the posts, which was further pruned by annotators. Querying posts on Twitter with extracted lexicons led to a collection of tweets. In order to have lexical diversity, we added randomly sampled tweets to our dataset. In spite of the sparse nature of these posts, the lexical characteristics act as information cues.

Figure 1 pictorially represents our methodology. Our approach required an initial set of informative tweets for which we employed two human annotators annotating a random sub-sample of the original dataset. From the samples, were marked as informative and as non informative (), discriminated on this criteria: Is the tweet addressing any complaint or raising grievances about modes of transport or services/ events associated with transportation such as traffic; public or private transport?. An example tweet marked as informative: No, metro fares will be reduced ???, but proper fare structure needs to presented right, it’s bad !!!.

We utilized tf-idf for the identification of initial seed phrases from the curated set of informative tweets. terms having the highest tf-idf scores were passed through the complete dataset and based on sub-string match, the transport relevant tweets

were identified. The redundant tweets were filtered based on the cosine similarity score.

Implicit information indicators were identified based on domain relevance score

, a metric used to gauge the coverage of n-gram (

,,) when evaluated against a randomly created pool of posts.

We collected a pool of randomly sampled tweets different from the data collection period. The rationale behind having such a metric was to discard commonly occurring n-grams normalized by random noise and include ones which are of lexical importance. We used terms associated with high domain relevance score (threshold determined experimentally) as seed phrases for the next set of iterations. The growing dictionary augments the collection process. The process ran for iterations providing us transport relevant tweets as no new lexicons were identified. In order to identify linguistic signals associated with the complaint posts, we randomly sampled a set of tweets which was used as training set, manually annotated into distinct labels: complaint relevant () and complaint non-relevant () (). We employed these features on our dataset.

Figure 1: Pictorial representation of the proposed pipeline.

Linguistic markers. To capture linguistic aspects of complaints, we utilized Bag of Words, count of POS tags and Word2vec clusters.

Sentiment markers. We used quantified score based on the ratio of tokens mentioned in the following lexicons: MPQA, NRC, VADER and Stanford.

Information specific markers. These account for a set of handcrafted features associated with complaint, we used the stated markers (a) Text-Meta Data, this includes the count of URL’s, hashtags, user mentions, special symbols and user mentions, used to enhance retweet impact; (b) Request Identification, we employed the model presented in [1] to identify if a specific tweet assertion is a request; (c) Intensifiers, we make use of feature set derived from the number of words starting with capital letters and the repetition of special symbols (exclamation, questions marks) within the same post; (d) Politeness Markers, we utilize the politeness score of the tweet extracted from the model presented in [1]; (e) Pronoun Variation, these have the ability to reveal the personal involvement or intensify involvement. We utilize the frequency of pronoun types } using pre-defined dictionaries.

From the pool of transport relevant tweets, we sampled tweets which were used as the testing set. The results are reported in Table1 with fold cross-validation. With increasing the number of iterations, the pool of seed phrases gets refined and augments the selection of transport relevant tweets. The proposed pipeline is tailored to identify complaint relevant tweets in a noisy scenario.


Table 1 reflects that the BOW model provided the best results, both in terms of accuracy and F1-score. The best result achieved by a sentiment model was the Stanford Sentiment ( F1-score), with others within the same range and linguistic-based features collectively giving the best performance.

Model Accuracy(%) F1-score
Linguistic Markers
Bag-of-Words 75.3 0.71
POS Tags 70.1 0.66
Word2Vec cluster 72.1 0.67
Sentiment Markers
Sentiment-MPQA 68.2 0.61
Sentiment-NRC 67.9 0.59
Sentiment-VADER 68.0 0.62
Sentiment-Stanford 68.7 0.63
Information Specific Markers
Text Meta-Data 69.3 0.62
Request Identification 70.1 0.66
Intensifiers 72.5 0.67
Politeness Markers 70.4 0.63
Pronoun Variations 69.6 0.65
Table 1: Performance of various linguistic, sentiment and information specific features on our dataset. Classifier utilized Logistic Regression (Elastic Net regularization), as it gave the best performance as compared to its counterparts.

Conclusion and Future Work

In this paper, we presented a novel semi-supervised pipeline along with a novel dataset for identification of complaint based posts in the transport domain. The proposed methodology can be expanded for other fields by altering the lexicons used for the creation of information cues

. There are limitations to this analysis; we do not use neural networks which mandate a large volume of data. In the future, we aim to identify demographic features for identification of complaint based posts on social media platforms.


  • [1] C. Danescu-Niculescu-Mizil, R. West, D. Jurafsky, J. Leskovec, and C. Potts (2013) No country for old members: user lifecycle and linguistic change in online communities. In Proceedings of the 22nd international conference on World Wide Web, pp. 307–318. Cited by: Proposed Methodology.
  • [2] M. E. Meinl (2013) Electronic complaints: an empirical study on british english and german complaints on ebay. Vol. 18, Frank & Timme GmbH. Cited by: Introduction.
  • [3] E. Olshtain and L. Weinbach (1985) Complaints-a study of speech act behavior among native and nonnative speakers of hebrew. Tel Aviv University. Cited by: Introduction.
  • [4] R. Shah and R. Zimmermann (2017) Multimodal analysis of user-generated multimedia content. Springer. Cited by: Introduction.