Sentiment analysis in tweets: an assessment study from classical to modern text representation models

by   Sérgio Barreto, et al.

With the growth of social medias, such as Twitter, plenty of user-generated data emerge daily. The short texts published on Twitter – the tweets – have earned significant attention as a rich source of information to guide many decision-making processes. However, their inherent characteristics, such as the informal, and noisy linguistic style, remain challenging to many natural language processing (NLP) tasks, including sentiment analysis. Sentiment classification is tackled mainly by machine learning-based classifiers. The literature has adopted word representations from distinct natures to transform tweets to vector-based inputs to feed sentiment classifiers. The representations come from simple count-based methods, such as bag-of-words, to more sophisticated ones, such as BERTweet, built upon the trendy BERT architecture. Nevertheless, most studies mainly focus on evaluating those models using only a small number of datasets. Despite the progress made in recent years in language modelling, there is still a gap regarding a robust evaluation of induced embeddings applied to sentiment analysis on tweets. Furthermore, while fine-tuning the model from downstream tasks is prominent nowadays, less attention has been given to adjustments based on the specific linguistic style of the data. In this context, this study fulfils an assessment of existing language models in distinguishing the sentiment expressed in tweets by using a rich collection of 22 datasets from distinct domains and five classification algorithms. The evaluation includes static and contextualized representations. Contexts are assembled from Transformer-based autoencoder models that are also fine-tuned based on the masked language model task, using a plethora of strategies.


page 1

page 2

page 3

page 4


L3CubeMahaSent: A Marathi Tweet-based Sentiment Analysis Dataset

Sentiment analysis is one of the most fundamental tasks in Natural Langu...

FinEAS: Financial Embedding Analysis of Sentiment

We introduce a new language representation model in finance called Finan...

Tweet Insights: A Visualization Platform to Extract Temporal Insights from Twitter

This paper introduces a large collection of time series data derived fro...

emojiSpace: Spatial Representation of Emojis

In the absence of nonverbal cues during messaging communication, users e...

Embedding generation for text classification of Brazilian Portuguese user reviews: from bag-of-words to transformers

Text classification is a natural language processing (NLP) task relevant...

Taqyim: Evaluating Arabic NLP Tasks Using ChatGPT Models

Large language models (LLMs) have demonstrated impressive performance on...

Please sign up or login with your details

Forgot password? Click here to reset