emojiSpace: Spatial Representation of Emojis

09/12/2022
by   Moeen Mostafavi, et al.
0

In the absence of nonverbal cues during messaging communication, users express part of their emotions using emojis. Thus, having emojis in the vocabulary of text messaging language models can significantly improve many natural language processing (NLP) applications such as online communication analysis. On the other hand, word embedding models are usually trained on a very large corpus of text such as Wikipedia or Google News datasets that include very few samples with emojis. In this study, we create emojiSpace, which is a combined word-emoji embedding using the word2vec model from the Genism library in Python. We trained emojiSpace on a corpus of more than 4 billion tweets and evaluated it by implementing sentiment analysis on a Twitter dataset containing more than 67 million tweets as an extrinsic task. For this task, we compared the performance of two different classifiers of random forest (RF) and linear support vector machine (SVM). For evaluation, we compared emojiSpace performance with two other pre-trained embeddings and demonstrated that emojiSpace outperforms both.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/13/2021

Khmer Text Classification Using Word Embedding and Neural Networks

Text classification is one of the fundamental tasks in natural language ...
research
03/06/2020

Quality of Word Embeddings on Sentiment Analysis Tasks

Word embeddings or distributed representations of words are being used i...
research
04/12/2021

Learning to Remove: Towards Isotropic Pre-trained BERT Embedding

Pre-trained language models such as BERT have become a more common choic...
research
05/29/2021

Sentiment analysis in tweets: an assessment study from classical to modern text representation models

With the growth of social medias, such as Twitter, plenty of user-genera...
research
06/04/2020

SOLO: A Corpus of Tweets for Examining the State of Being Alone

The state of being alone can have a substantial impact on our lives, tho...
research
03/29/2021

Be Careful about Poisoned Word Embeddings: Exploring the Vulnerability of the Embedding Layers in NLP Models

Recent studies have revealed a security threat to natural language proce...
research
03/23/2018

Stance Detection on Tweets: An SVM-based Approach

Stance detection is a subproblem of sentiment analysis where the stance ...

Please sign up or login with your details

Forgot password? Click here to reset