A Comparative Study of Sentiment Analysis Using NLP and Different Machine Learning Techniques on US Airline Twitter Data

by   Md. Taufiqul Haque Khan Tusar, et al.

Today's business ecosystem has become very competitive. Customer satisfaction has become a major focus for business growth. Business organizations are spending a lot of money and human resources on various strategies to understand and fulfill their customer's needs. But, because of defective manual analysis on multifarious needs of customers, many organizations are failing to achieve customer satisfaction. As a result, they are losing customer's loyalty and spending extra money on marketing. We can solve the problems by implementing Sentiment Analysis. It is a combined technique of Natural Language Processing (NLP) and Machine Learning (ML). Sentiment Analysis is broadly used to extract insights from wider public opinion behind certain topics, products, and services. We can do it from any online available data. In this paper, we have introduced two NLP techniques (Bag-of-Words and TF-IDF) and various ML classification algorithms (Support Vector Machine, Logistic Regression, Multinomial Naive Bayes, Random Forest) to find an effective approach for Sentiment Analysis on a large, imbalanced, and multi-classed dataset. Our best approaches provide 77 Regression with Bag-of-Words technique.



There are no comments yet.


page 1

page 2


Sentiment Analysis of Twitter Data: A Survey of Techniques

With the advancement of web technology and its growth, there is a huge v...

Recommending Insurance products by using Users' Sentiments

In today's tech-savvy world every industry is trying to formulate method...

Using Natural Language Processing to Understand Reasons and Motivators Behind Customer Calls in Financial Domain

In this era of abundant digital information, customer satisfaction has b...

Turkish Sentiment Analysis Using Machine Learning Methods: Application on Online Food Order Site Reviews

Satisfaction measurement, which emerges in every sector today, is a very...

NLP in FinTech Applications: Past, Present and Future

Financial Technology (FinTech) is one of the worldwide rapidly-rising to...

Sentiment Analysis Challenges in Persian Language

The rapid growth in data on the internet requires a data mining process ...

Integration of Machine Learning Techniques to Evaluate Dynamic Customer Segmentation Analysis for Mobile Customers

The telecommunications industry is highly competitive, which means that ...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Customer satisfaction is an assessment of consumer’s perception of products, services, and organizations. Many researchers have found that the quality of products or services and customer happiness are the most essential aspects of business performance [1]. To ensure the organization’s competitiveness, businesses must carefully consider what their customers require and want from the products or services they provide. Also, they must well manage their customers by making them satisfied to do business with them [2]. In [3], the author investigated data from 2007 to 2011 of the top 14 U.S. airline’s service quality and customer satisfaction. The result reveals that the airline sector has been struggling to provide outstanding services and meet the requirements of diverse consumer groups.

Most of the data in social networks or any other platforms are unstructured. Extracting customer’s opinions and taking necessary decisions from such data is laborious. Sentiment Analysis is a decisive approach that aids in the detection of people’s opinion. The principal aim of Sentiment Analysis is to classify the polarity of textual data, whether it is positive, negative, or neutral. Sentiment Analysis tools enable decision-makers to track changes in public or customer sentiment regarding entities, activities, products, technologies, and services [4]. A business organization can easily improve its products and services, a political party or social organization can achieve quality work with help of Sentiment Analysis. Through Sentiment Analysis, it’s easier to understand broad public opinion in a short time.

Most of the data for sentiment analysis are collected from social media platforms and stored in files that are called datasets. But it becomes challenging to analyze sentiment when the datasets are imbalanced, large, multi-classed, etc.

In this paper, we have worked with a large, imbalanced, multi-classed, and real-world dataset named Twitter US Airline Sentiment [13]. We have applied NLP techniques to pre-process and vectorize the data. Thereafter classified the polarity of textual data using Machine Learning classification algorithms. Applied algorithms are Support Vector Machine, Multinomial Naive Bayes, Random Forest, and Logistic Regression. NLP techniques are Bag-of-Words, Term Frequency - Inverse Document Frequency. Finally, compared the applied Machine Learning algorithms and NLP techniques to find the best approach.

Ii Methodology

There are the steps for our approaches:

  1. Collecting dataset to train and test ML Classifier.

  2. Pre-processing the dataset for subsequent processing.

  3. Converting textual data into vector form using NLP.

  4. Dividing the dataset into training and testing groups. Then train the ML Classifier with training data and predict the polarity of testing data.

Fig.1 depicts the workflow of Sentiment Analysis using NLP and different Machine Learning techniques.

Figure 1: Workflow of Sentiment Analysis using NLP and Machine Learning

Ii-a Data Collection

The data originally came from CrowdFlower’s Data for Everyone library. Contributors scraped Twitter data of the travelers who traveled through six US airlines in February 2015. They provided the data on Kaggle as a dataset, named Twitter US Airline Sentiment [13] under the CC BY-NC-SA 4.0 license. The dataset has around 14640 records and 15 attributes. It contains whether the sentiment of the tweets in this set was positive, neutral, or negative for six US airlines services. Fig.2 shows the frequency of polarity in the dataset.

Figure 2: Frequency of Positive, Negative, and Neutral tweets in the dataset

Ii-B Pre-Processing

A tweet can contain various symbols (!, #, @, etc), numbers, punctuation, or stop-words. Stop-words mean which words don’t comprise any sentiment. Such as he, she, the, is, that. These are noisy data for Sentiment Analysis. So, we have cleaned the data for further processing by removing punctuation, number, symbol, converting all the characters into lowercase. Then we have divided the tweet into tokens and removed stop-words from the list of tokens. Then converted the tokens into their base form. To convert into the base form, the Lemmatization technique has been used. Then we have stored the cleaned and pre-processed base forms of each tweet in a list called vocabulary. Table I shows the outcome of pre-processing as an example.

Tweet 1
#Delicious #Beef #Cheese #Burger
@McDonald Testing CheeseBurger and Hamburger
After Pre-processing
[delicious, beef, cheese, burger, mcdonald, taste,
cheeseburger, hamburger]
Tweet 2
#Late Service @McDonald
Delicious Hamburger but slow service
After Pre-processing [late, service, mcdonald, delicious, hamburger, slow]
[delicious, beef, cheese, burger, mcdonald, taste,
cheeseburger, hamburger, late, service, slow]
Table I: Pre-processing of noisy data

Ii-C Vectorization

The Machine Learning model can not understand the textual data. We have to feed numerical value to the machine learning model. So we should convert the textual data into vector form for subsequent processing. There are two popular Natural Language Processing techniques for Vectorization (i) Bag-of-Words (ii) Term Frequency - Inverse Document Frequency.

  • Bag-of-Words (BoW): The idea behind BoW is to mark the occurrence of the word in each tweet from the vocabulary to convert it into a vector representation. We should use 1s and 0s to mark the appearance of each of these words. Below given an example of Bag-of-Words for tweet 1 and tweet 2 of Table I.

    Tokens Tweet 1 Tweet 2
    delicious 1 1
    beef 1 0
    cheese 1 0
    burger 1 0
    mcdonald 1 1
    taste 1 0
    cheeseburger 1 0
    hamburger 1 1
    late 0 1
    service 0 1
    slow 0 1
    Table II: Bag of words

    Vector form of Tweet 1 = [1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0] and Tweet 2 = [1, 0, 0, 0, 1, 0, 0, 1, 1, 1, 1].

  • Term Frequency - Inverse Document Frequency (TF-IDF): It is used to find the important terms or words that appear in the document or tweet based on their frequency. In TF-IDF, the less frequent word means more important. The formula of TF-IDF is TF multiplied by IDF.

    Term Frequency (TF) returns the frequency of a term (t) in each document (d) from the pre-processed vocabulary.

    There, n = Number of times the term (t) found in the document (d).
    Total number of terms (t) in the document (d).

    Tweet 1
    Tweet 2
    delicious 1 1/8 1 1/6
    beef 1 1/8 0 0
    cheese 1 1/8 0 0
    burger 1 1/8 0 0
    mcdonald 1 1/8 1 1/6
    taste 1 1/8 0 0
    cheeseburger 1 1/8 0 0
    hamburger 1 1/8 1 1/6
    late 0 0 1 1/6
    service 0 0 1 1/6
    slow 0 0 1 1/6
    Table III: Term Frequency for Tweet 1 and Tweet 2

    Inverse Document Frequency (IDF) calculates the weight of important words that appear in all documents.

    There, N = Total number of documents,
    df = Number of documents containing term (t).

    N = 2
    delicious 2 0
    beef 1 0.69
    cheese 1 0.69
    burger 1 0.69
    mcdonald 2 0
    taste 1 0.69
    cheeseburger 1 0.69
    hamburger 2 0
    late 1 0.69
    service 1 0.69
    slow 1 0.69
    Table IV: Inverse Document Frequency for Tweet 1 and Tweet 2
    Terms Tweet 1 Tweet 2 Tweet 1 Tweet 2
    delicious 1/8 1/6 0 0 0
    beef 1/8 0 0.69 0.0863 0
    cheese 1/8 0 0.69 0.0863 0
    burger 1/8 0 0.69 0.0863 0
    mcdonald 1/8 1/6 0 0 0
    taste 1/8 0 0.69 0.0863 0
    cheeseburger 1/8 0 0.69 0.0863 0
    hamburger 1/8 1/6 0 0 0
    late 0 1/6 0.69 0 0.115
    service 0 1/6 0.69 0 0.115
    slow 0 1/6 0.69 0 0.115
    Table V: Term frequency - Inverse document frequency for Tweet 1 and Tweet 2

    In the end, Tweet 1 = [0, 0.863, 0.863, 0.863, 0, 0.863, 0.863, 0, 0, 0, 0] and Tweet 2 = [0, 0, 0, 0, 0, 0, 0, 0, 0.115, 0.115, 0.115]. Table V shows the final calculation of TF-IDF.

Ii-D Classification

We have used the Train-Test-Split technique to divide the dataset into 75% for training and 25% for testing. Then applied different classification algorithms of Supervised Machine Learning on training data to train Machine Learning Classifiers and tested with testing data. Applied algorithms are Support Vector Machine, Multinomial Naive Bayes, Random Forest, and Logistic Regression.

Iii Result and Discussion

We have evaluated the performance of our approaches with Accuracy, Precision, Recall, and F1-Score matrices. As the dataset was an imbalanced dataset, we have calculated the weighted average of precision, recall, and F1-Score. Formulas used for evaluation are as follows.

Table VI and VII show the summary of Accuracy, Precision, Recall, and F1-Score matrices found from applied Machine Learning classification algorithms and NLP techniques. Where Both SVM and Logistic Regression provide the highest accuracy of 77% with a slight difference in F1-Score.

Support Vector
Machine (SVM)
0.77 0.76 0.77 0.75
Multinomial Naive
0.74 0.72 0.74 0.72
0.74 0.73 0.74 0.73
0.77 0.77 0.77 0.77
Table VI: Classification algorithms with Bag-of-Words (BoW)
Support Vector
Machine (SVM)
0.77 0.76 0.77 0.74
Multinomial Naive
0.70 0.72 0.70 0.63
0.75 0.73 0.75 0.73
0.77 0.76 0.77 0.76
Table VII: Classification algorithms with Term Frequency - Inverse Document Frequency (TF-IDF)

Finally, In Table VIII we have compared the Accuracy and F1-Score between classification algorithms with BoW and classification algorithms with TF-IDF and selected the best approaches for Sentiment Analysis based on our experiments. In our approaches, the SVM and Logistic Regression provide the highest accuracy of 77% with the Bag-of-Words technique.

Performance with BoW Performance with TF-IDF
Algorithms Accuracy F1-Score Accuracy F1-Score
Support Vector
Machine (SVM)
0.77 0.75 0.77 0.74
0.77 0.77 0.77 0.76
Table VIII: Comparison Between Approaches of Table VI and Table VII
  • Comparison Between Related Works and Approaches of This Paper:

    Title & Year Algorithm & Accuracy
    Sentiment Analysis Using Naive
    Bayes Algorithm Of The Data
    Crawler: Twitter (2019) [5]
    Support Vector Machine 63.99%
    Twitter Sentiments Analysis Using
    Machine Learning Methods (2020)
    Support Vector Machine 74.60%
    Sentiment Analysis for Airline
    Tweets Utilizing Machine Learning
    Techniques (2021) [7]
    Support Vector Machine 74.24%
    An Efficient Approach for Sentiment
    Analysis Using Machine Learning
    Algorithm (2020) [8]
    Support Vector Machine 68.00%
    Collaborative Classification Appr-
    oach for Airline Tweets Using
    Sentiment Analysis (2021)
    Support Vector Machine 65.59%
    Logistic Regression 77.42%
    Random Forest 75.29%
    A Comparative Analysis of Various
    Machine Learning Based Social Me-
    dia Sentiment Analysis and Opinion
    Mining Approaches (2020) [10]
    Support Vector Machine 50.00%
    Logistic Regression 74.10%
    Random Forest 70.90%
    A Study on The Performance of
    Supervised Algorithms for Classifi-
    cation in Sentiment Analysis (2019)
    Support Vector Machine 66.59%
    Random Forest 49.67%
    Sentiment Analysis of Arabic and
    English Tweets(2019)
    Multinomial Naive Bayes 70.00%
    Logistic Regression 74.00%
    The approaches of this paper
    Support Vector Machine 77.00%
    Logistic Regression 77.00%

    In Table IX, we have compared the accuracy of our selected approaches with some recent related work. In [5] and [8], Authors applied different data pre-processing techniques and ML algorithms. In our approach SVM provides 13% and 9% more accuracy respectively. In [9], Authors applied different algorithms and proposed a voting classifier. In our paper we have proposed more accurate approaches. Succinctly From [5] to [12] different authors have applied various techniques and different ML algorithms. But, the mentioned approaches comparatively provide better performance than existing studies.

Iv Conclusion

In this paper, we have implemented various Machine Learning classification algorithms and NLP techniques on a large, imbalanced, multi-classed, and real-world dataset to analyze sentiment. Our best approaches provide 77% accuracy with both Support Vector Machine and Logistic Regression algorithm along with the Bag-of-Words technique. In the future, we would like to apply more advanced techniques to increase accuracy and will also try to build a generalized and robust model for similar datasets.


  • [1] P. Suchánek and M. Králová, “Effect of customer satisfaction on company performance,” Acta Univ. Agric. Silvic. Mendel. Brun., vol. 63, no. 3, pp. 1013–1021, 2015.
  • [2] Safariena Ilias and Mohd Farid Shamsudin, “Customer Satisfaction and Business Growth”, JUSST, vol. 2, no. 2, 2020.
  • [3] D. M. A. Baker, “Service quality and customer satisfaction in the airline industry: A comparison between legacy airlines and low-cost airlines,” Am. J. Tour. Res., vol. 2, no. 1, 2013.
  • [4]

    F. Alattar and K. Shaalan, ”Using Artificial Intelligence to Understand What Causes Sentiment Changes on Social Media,” in IEEE Access, vol. 9, pp. 61756-61767, 2021.

  • [5] M. Wongkar and A. Angdresey, “Sentiment analysis using naive Bayes algorithm of the data crawler: Twitter,” in 2019 Fourth International Conference on Informatics and Computing (ICIC), 2019, pp. 1–5.
  • [6] L. Mandloi and R. Patel, ”Twitter Sentiments Analysis Using Machine Learninig Methods,” 2020 International Conference for Emerging Technology (INCET), 2020, pp. 1-5.
  • [7] G. Ravi Kumar, K. Venkata Sheshanna, and G. Anjan Babu, “Sentiment analysis for airline tweets utilizing machine learning techniques,” in International Conference on Mobile Computing and Sustainable Informatics, Cham: Springer International Publishing, 2021, pp. 791–799.
  • [8] A. Naresh and P. Venkata Krishna, “An efficient approach for sentiment analysis using machine learning algorithm,” Evol. Intell., vol. 14, no. 2, pp. 725–731, 2021.
  • [9] M. V. K. Et.al, “Collaborative classification approach for airline tweets using sentiment analysis,” Turk. J. Comput. Math. Educ. (TURCOMAT), vol. 12, no. 3, pp. 3597–3603, 2021.
  • [10] K. Jayamalini, M. Ponnavaikko, and J. Kothandan, “A comparative analysis of various machine learning based social media sentiment analysis and opinion mining approaches,” Adv. Math., Sci. J., vol. 9, no. 11, pp. 10195–10209, 2020.
  • [11] P. B. Sunitha, S. Joseph, and P. V. Akhil, “A study on the performance of supervised algorithms for classification in sentiment analysis,” in TENCON 2019 - 2019 IEEE Region 10 Conference (TENCON), 2019.
  • [12] M. K. Elhadad, K. F. Li, and F. Gebali, “Sentiment analysis of Arabic and English tweets,” in Advances in Intelligent Systems and Computing, Cham: Springer International Publishing, 2019, pp. 334–348.
  • [13] Kaggle, https://www.kaggle.com/crowdflower/twitter-airline-sentiment, (Last visit on 17 May 2020)