A Semantics-Based Measure of Emoji Similarity

07/14/2017 ∙ by Sanjaya Wijeratne, et al. ∙ 0

Emoji have grown to become one of the most important forms of communication on the web. With its widespread use, measuring the similarity of emoji has become an important problem for contemporary text processing since it lies at the heart of sentiment analysis, search, and interface design tasks. This paper presents a comprehensive analysis of the semantic similarity of emoji through embedding models that are learned over machine-readable emoji meanings in the EmojiNet knowledge base. Using emoji descriptions, emoji sense labels and emoji sense definitions, and with different training corpora obtained from Twitter and Google News, we develop and test multiple embedding models to measure emoji similarity. To evaluate our work, we create a new dataset called EmoSim508, which assigns human-annotated semantic similarity scores to a set of 508 carefully selected emoji pairs. After validation with EmoSim508, we present a real-world use-case of our emoji embedding models using a sentiment analysis task and show that our models outperform the previous best-performing emoji embedding model on this task. The EmoSim508 dataset and our emoji embedding models are publicly released with this paper and can be downloaded from http://emojinet.knoesis.org/.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

With the rise of social media, pictographs, commonly referred to as ‘emoji’ have become one of the world’s fastest-growing forms of communication111https://goo.gl/jbeRYW. This rapid growth of emoji began in 2011 when the Apple iPhone added an emoji keyboard to iOS, and again in 2013 when the Android mobile platform started to support emoji on their mobile devices [8]. Emoji permeate modern online and web-based communication and are now regarded as a natural and common form of expression. In fact, the Oxford Dictionary named ‘face with tears of joy’ 1F602 as the word of the year in 2015222https://goo.gl/6oRkVg. Not only individuals but also business organizations have adopted emoji with a 777% year-over-year increase and 20% month-over-month increase in emoji usage for marketing campaigns in 2016333https://goo.gl/ttxyP1. Major search engines, including Bing444https://goo.gl/5iy8Dx and Google555https://goo.gl/oDfZTQ, now support web searches involving emoji as search terms.

As analysis and modeling of written text by Natural Language Processing (NLP) techniques have enabled important advances such as machine translation 

[35], word sense disambiguation [26], and search [10], the transfer of such methods (or development of new methods) over emoji is only beginning to be explored [38]. The ability to automatically process, derive meaning, and interpret text fused with emoji will be essential as society embraces emoji as a standard form of online communication. Foundational to many emoji analysis tasks will be a way to measure similarity, including: (i) corpus searching, where documents (or a query) contains emoji symbols [6]; (ii) sentiment analysis [4, 9]

, where emoji sentiment lexicons 

[28] are known to improve the performance; and (iii) interface design, mainly in optimizing mobile phone keyboards [30]. In fact, as of 2017, the poor design of emoji keyboards for mobile devices may be relatable to the reader: there are 2,389 emoji supported by the Unicode Consortium, yet listing and searching through all of them on a mobile keyboard is a time consuming task. Grouping similar emoji together could lead to optimized emoji keyboard designs for mobile devices [30].

The notion of the similarity of two emoji is very broad. One can imagine a similarity measure based on the pixel similarity of emoji pictographs, yet this may not be useful since the pictorial representation of an emoji varies by mobile and computer platform [25, 33, 7]. Two similar looking pictographs may also correspond to emoji with radically different senses (e.g., twelve thirty 1F567 and six o’clock 1F555, raised hand 0270B and raised back of hand 1F91A, octopus 1F419, and squid 1F991, etc.) [37, 38]. Instead, we are interested in measuring the semantic similarity of emoji such that the measure reflects the likeness of their meaning, interpretation or intended use. Understanding the semantics of emoji requires access to a repository of emoji meanings and interpretations. The release of a new resource called EmojiNet [38] offers free and open access to an aggregation of such meanings and interpretations (called senses) collected from major emoji databases on the Internet (e.g., The Unicode Consortium, The Emoji Dictionary, and Emojipedia).

A collection of emoji sense definitions can enable a semantics-based measure of similarity through vector word embeddings. Word embeddings are a powerful and proven way 

[22] to measure word similarity based on their meaning. They have been widely used in semantic similarity tasks [13, 15, 5] and empirically shown to improve the performance of word similarity tasks when used with proper parameter settings [18]. Word vectors also provide a convenient way of comparing them across each other. Thus, representing the emoji meanings using word embedding models can be used to generate word vectors that encode emoji meanings, which we call emoji embedding models.

In this paper, we present a comprehensive study on measuring the semantic similarity of emoji using emoji embedding models. We extract machine-readable emoji meanings from EmojiNet to model the meaning of an emoji. Using pre-trained word embedding models learned over a Twitter dataset of 110 million tweets and a Google News text corpus of 100 billion words, we encode the extracted emoji meanings to obtain emoji embedding models. To create a gold standard dataset for evaluating how well the emoji embeddings measure similarity, we ask ten human annotators who are knowledgeable about emoji to manually rate the similarity of 508 pairs of emoji. This dataset of human annotations, which we call ‘EmoSim508’, is made available with this paper for use by other researchers. We evaluate the emoji embeddings by first establishing that the similarity measured by our embedding models align with the ratings of the human annotators using statistical measures. Then, we apply our emoji embedding models to a sentiment analysis task to demonstrate the utility of them in a real-world NLP application. Our models were able to correctly predict the sentiment class of tweets laden with emoji from a benchmark dataset [28] with an accuracy of 63.6 (7.73% improvement), outperforming the previous best results on the same dataset [4, 9].

This paper is organized as follows. Section 2 discusses the related literature and frames how this work differs from and furthers existing research. Section 3 discusses how the emoji meanings are represented using the different emoji definitions extracted from EmojiNet and how the emoji embeddings are learned. Section 4 explains the creation of the EmoSim508 dataset. Section 5 reports how well the emoji embedding models perform on an emoji similarity analysis task and Section 6 reports the performance of our emoji embedding models in a downstream sentiment analysis task. Section 7 offers concluding remarks and plans for future work.

2 Related Work

While emoji were introduced in the late 1990s, their use and popularity was limited until the Unicode Consortium started to standardize emoji symbols in 2009 [37]. Major mobile phone manufactures such as Apple, Google, Microsoft, and Samsung then began supporting emoji in their device operating systems between 2011 and 2013, which boosted emoji adoption around the world [8]. Early research on emoji was focused on understanding the role of emoji in computer-mediated communication. Kelly et al. studied how people in close relationships use emoji in their communications and reported that they use emoji as a way of making their conversations playful [16]. Pavalanathan et al. studied how Twitter users adopt emoji and reported that Twitter users prefer emoji over emoticons [29]. Researchers have also studied how emoji usage and interpretation differ across mobile and computer platforms [25, 33, 7], geographies [20], and across languages [3] where many others have used emoji as features in their learning algorithms for problems such as emoji-based search [6], sentiment analysis [28], emotion analysis [34], and Twitter profile classification [2, 36].

Emoji similarity has received little attention apart from three attempts by Barbieri et al. [4], Eisner et al. [9] and Pohl et al. [30]. Barbieri et al. [4] collected a sample of 10 million tweets originated from the USA and trained an emoji embedding model using tweets as the input. Then, using 50 manually-generated emoji pairs annotated by humans for emoji similarity and relatedness, they evaluated how well the learned emoji embeddings align with the human annotations. They reported that the learned emoji embeddings align more closely with the relatedness judgment scores of human annotators than the similarity judgement scores. Eisner et al. [9] used a word embedding model learned over the Google News corpus666https://goo.gl/QaxjVC, applied it to emoji names and keywords extracted from the Unicode Consortium website, and learned an emoji embedding model which they called emoji2vec

. Using t-SNE for data visualization 

[21], Eisner et al. showed that the high dimensional emoji embeddings learned by emoji2vec could group emoji into clusters based on their similarity. They also showed that their emoji embedding model could outperform Barbieri et al.’s model in a sentiment analysis task. Pohl et al. [30] studied the emoji similarity problem using two methods; one based on the emoji keywords extracted from the Unicode Consortium website and the other based on emoji embeddings learned from a Twitter message corpus. They used the Jaccard Coefficient777https://goo.gl/RKkRzF on the emoji keywords extracted from the Unicode Consortium to find the similarity of two emoji. They evaluated their approach using 90 manually-generated emoji pairs and argued for how emoji similarity can be used to optimize the design of emoji keyboards.

Our work differs from the related works discussed above in many ways. Barbieri et al. [4] use the distributional semantics [11] of words learned over a Twitter corpus where they seek an understanding of emoji meanings from how emoji are used in a large collection of tweets. In contrast, this paper learns emoji embeddings based on emoji meanings extracted from EmojiNet. We learn the distributional semantics of the words in emoji definitions using word embeddings learned over two large text corpora and use the learned word embeddings to model the emoji meanings extracted from EmojiNet. Hence, we combine emoji meanings extracted from knowledge bases (i.e., EmojiNet) with distributional semantics of those words in emoji definitions. Pohl et al. [30] learn emoji embedding models in the same way as Barbieri et al. and use the Jaccard Coefficient on emoji keywords extracted from the Unicode Consortium to measure similarity. This is similar to our earlier work on emoji similarity [38], which we build upon in this paper. Eisner et al.’s [9] presented an embedding model built on short emoji names and keywords listed on the Unicode Consortium website, which is approximately 4 to 5 words long on average as reported by Pohl et al. in [30]. Since prior research suggests that the emoji embedding models can be improved by incorporating more words by using longer emoji definitions [9, 30], we introduce embeddings based on three different types of long-form definitions of an emoji.

Nonuple Element Description
Unicode U+1F64C
Emoji Name Raising Hands
Short Code :raised_hands:
Definition 5cmTwo hands raised in the air,
celebrating success or an event.
Keywords celebration, hand, hooray, raised
Images 1F64A 1F64X 1F64T 1F64M 1F64B 1F64G
Related Emoji Confetti Ball, Clapping Hands Sign
Emoji Category Gesture symbols
Senses 5cmSense Label: celebration(Noun)
Def: A joyful occasion for special
festivities to mark a happy event.
Table 1: Nonuple Representation of an Emoji

3 Learning Emoji Embedding Models

In this section, we briefly present the EmojiNet resource and the different types of emoji sense definitions it contains. We subsequently discuss the training of emoji embedding models, constructed from the sense definitions extracted from EmojiNet.

3.1 EmojiNet

EmojiNet is a comprehensive machine-readable emoji sense inventory [38]. EmojiNet maps emoji to their set of possible meanings or senses. It consists of 12,904 sense labels over 2,389 emoji, which were extracted from the web and linked to machine-readable sense definitions seen in BabelNet [27]. Each emoji in EmojiNet is represented as a nonuple representing its sense and other metadata. For each emoji , the nonuple is given as , where is the Unicode representation of , is the name of , is the short code of , is a description of , is the set of keywords that describe intended meanings attached to , is the set of images that are used in different rendering platforms, is the set of related emoji extracted for , is the set of categories that belongs to, and is the set of different senses in which can be used within a sentence. Apart from this, each sense is defined as a combination of a word (e.g., laugh), its part-of-speech (POS) tag (e.g., noun), and its definition in a message context or gloss (e.g., Produce laughter). An example of the nonuple notation is shown in Table 1. EmojiNet is hosted as an open service with a REST API at http://emojinet.knoesis.org/.

3.2 Representation of Emoji Meaning

We consider three different ways to represent the meaning of an emoji using the information in EmojiNet. Specifically, we extract emoji descriptions, emoji sense labels, and the emoji sense definitions of each emoji sense from EmojiNet to model the meaning of an emoji. We discuss each briefly below:

Emoji Description (Sense_Desc.): Emoji descriptions give an over-view of what is depicted in an emoji and its intended uses. For example, for the pistol emoji 1F52B, EmojiNet lists “A gun emoji, more precisely a pistol. A weapon that has potential to cause great harm. Displayed facing right-to-left on all platforms” as its description. One could use this information to get an understanding of how the pistol emoji should be used in a message.

Emoji Sense Labels (Sense_Label): Emoji sense labels are word-POS tag pairs (such as laugh(noun)) that describe the senses and their part-of-speech under which an emoji can be used in a sentence. Emoji sense labels can act as words that convey the meaning of an emoji and thus, can play an important role in understanding the meaning of an emoji. For example, for pistol emoji 1F52B, EmojiNet lists 12 emoji sense labels consisting of 6 nouns (gun, weapon, pistol, violence, revolver, handgun), 3 verbs (shoot, gun, pistol) and 3 adjectives (deadly, violent, deathly).

Emoji Sense Definitions (Sense_Def.): Emoji sense definitions are the textual descriptions that explain each sense label and how those sense labels should be used in a sentence. For example, for the gun(Noun) sense label of the pistol emoji 1F52B, EmojiNet lists 5 sense definitions that complement each other888https://goo.gl/gm7TQ2. These emoji sense definitions can be valuable in understanding the meaning of an emoji; thus, we use them to represent the meaning of an emoji.

3.3 Learning the Emoji Embedding Models

Once the machine-readable emoji descriptions are extracted from EmojiNet, we use word embedding models [22]

to convert them into a vectorial representation. A word embedding model is a neural network that learns rich representations of words in a text corpus. It takes data from a large,

-dimensional ‘word space’ (where is the number of unique words in a corpus) and learns a transformation of the data into a lower -dimensional space of real numbers. This transformation is developed in a way that similarities between the -dimensional vector representation of two words reflects semantic relationships among the words themselves. Word embedding models are inspired by the distributional hypothesis (i.e., words that are co-occurring in the same contexts tend to carry similar meanings), hence the semantic relationships among word vectors are learned based on the word co-occurrence in contexts (e.g., sentences) extracted from large text corpora. Mikolov et al. have shown that these word embeddings can learn different types of semantic relationships, including gender relationships (e.g., King-Queen) and class inclusion (e.g., Clothing-Shirt) among many others [24]. Similar to word embedding models, an emoji embedding model is defined as an emoji symbol and its learned -dimensional vector representation.

Figure 1: Learning Emoji Embedding Models using Word Vectors

We chose two different types of datasets, namely, a tweet corpus and a Google News corpus, to train emoji embedding models. We made this selection to make it easy to compare our emoji embedding models with other works that have used embedding models based on tweet text and Google News text. To train the Twitter word embedding model, we first collected a Twitter dataset that contained emoji using the Twitter public streaming API999https://dev.twitter.com/streaming/public. The dataset was collected using emoji Unicodes as filtes over a four week period, from August 6, 2016 to September 8, 2016. It consists of 147 million tweets containing emoji. We first removed all retweets and then converted all emoji in the remaining 110 million unique tweets into textual features using the Emoji for Python101010https://pypi.python.org/pypi/emoji/ API. The tweets were then stemmed before being processed with Word2Vec [22] using a Skip-gram model with negative sampling. This process is depicted in Figure 1. We choose the Skip-gram model with negative sampling to train our model as it is shown to generate robust word embedding models even when certain words are less frequent in the training corpus [23]. We set the number of dimensions of our model to 300 and the negative sampling rate to 10 sample words, which are shown to work well empirically [23]. We set the context word window to 5 (words to in Figure 1) so that it will consider 5 words to left and right of the target word (word in Figure 1) at each iteration of the training process. This setting is suitable for sentences where the average sentence length is less than 11 words, as is the case in tweets [14]. We ignore the words that occur fewer than 10 times in our Twitter dataset when training the word embedding model. We use a publicly available word embedding model that is trained over Google News corpus111111https://goo.gl/QaxjVC to obtain Google News word embeddings.

We use the learned word vectors to represent the different types of emoji definitions listed in Section 3.2. All words in each emoji definition are replaced with their corresponding word vectors as shown in Figure 1. For example, all words in the pistol emoji’s 1F52B description, which is “A gun emoji, more precisely a pistol. A weapon that has potential to cause great harm. Displayed facing right-to-left on all platforms” are replaced by the word vectors learned for each word. Then, to get the emoji embedding, the word vectors of all words in the emoji definition are averaged into form a final single vector of size 300 (the dimension size). The vector mean (or average) adjusts for word embedding bias that could take place due to certain emoji definitions having considerably more words than others [17]. If the total number of words in the emoji definition is , the combined word vector is calculated by:

Using the three different emoji definitions and two types of word vectors learned over Twitter and the Google News corpora, we learn six different embeddings for each emoji. Then we integrate all words in the three types of emoji definitions into a set called (Sense_All) and learn two more emoji embeddings over them by using the two types of word vectors. We take this step as prior research suggests that having more words in an emoji definition could improve the embeddings learned over them [9, 30]. We thus learn a total of 8 embeddings for emoji. The utility of each embedding as a similarity measure is discussed next.

4 Ground Truth Data Curation

Once the emoji embedding vectors are learned, it is necessary to evaluate how well those represent emoji meanings. For this purpose, we create an emoji similarity dataset called ‘EmoSim508’ that consist of 508 emoji pairs which were assigned similarity scores by ten human judges. This section discusses the development of the EmoSim508 emoji similarity dataset, which is available at http://emojinet.knoesis.org/emosim508.php.

4.1 Emoji Pair Selection

Curating a reasonable sample of emoji pairs for human evaluation is a critical step: there are 2,389 emoji, leading to over 5 million emoji pairs, which would be impossible to ask a human to evaluate for their similarity. Hand-picking emoji pairs might not be a good approach as such a dataset would not cover a wide variety of similarities or could be biased towards certain relationships that commonly exist among emoji [4]. Furthermore, random sampling of the emoji pairs will lead to many unrelated emoji as suggested by Barbieri et al. [4], making the dataset less useful as a gold standard dataset. We thus sought to curate the EmoSim508 dataset in such a way that the emoji pairs in it are not hand-picked but still represent a ‘meaningful’ dataset. By meaningful, we mean that the dataset contains emoji pairs that are often seen together in practice. The dataset should also have pairs that are related, unrelated, and the shades in-between, leading to a diverse collection of examples for evaluating a similarity measure. To address this, we consider the most frequently co-occurring emoji pairs from the Twitter corpus used to learn word vectors in Section 3.3 and created a plot of how often pairs of emoji co-occur with each other. From this plot, shown in Figure 2, we select the top-k emoji that cover 25% of our Twitter dataset (shown in the dotted red line in Figure 2). This resulted in the top 508 emoji pairs. Since the co-occurence frequency plot is decidedly heavy-tailed (the blue line), we chose the 25% threshold, giving us a dataset which is 10 times bigger than the previous dataset used by Barbieri et al. [4] to calculate emoji similarity. These 508 emoji pairs have 158 unique emoji. We have also shown the top 10 and bottom 10 emoji pairs based on their co-occurrence frequency in Figure 2. We can observe that the face emoji are dominant in the top 10 emoji pairs while bottom 10 contain few interesting emoji pairs such as 1F498 and 1F618, 1F612 and 1F629, and 1F30A and 1F3C4.

Figure 2: Emoji Co-Occurrence Frequency Graph

4.2 Human Annotation Task

We use human annotators to assign similarity scores to each emoji pair in the EmoSim508 dataset. A total of ten annotators were used, all of whom were graduate students at Wright State University, and of whom four were male and six were female. Their ages ranged from 24 years to 32 years; past studies suggest people within this age range use emoji frequently121212https://goo.gl/GSbCGL. The annotators were shown a webpage with two emoji and were prompted with two questions, one related to emoji equivalence and the other related to emoji relatedness, which they were required to answer on a five-point Lickert scale [19] ranging from 0 to 4, where 0 means the emoji were dissimilar and 4 means the emoji were identical. We selected the five-point Lickert scale for our study for two main reasons. Firstly, past research has shown that Lickert scale is best suited for questionnaire-based user studies and five-point scale have shown to perform better than other scales (seven-points and ten-points) empirically [31]. Secondly, many human annotators-involved word similarity experiments have used the same Lickert scale from 0 to 4 to calculate the similarity of words [32]. The two questions we asked from the annotators were:

  • Q1. How equivalent are these two emoji?
    (i.e., can the use of one emoji be replaced by the other?)

  • Q2. How related are these two emoji?
    (i.e., can one use either emoji in the same context?)

We asked Q1 to understand whether an equivalence relationship exists between an emoji pair and Q2, to understand whether a relatedness relationship exists between them. Annotators answered the same two questions for all 508 emoji pairs in the EmoSim508 dataset. We then averaged values received as answers for the ordinal selections (0 to 4) for both questions separately and assign the emoji pair an emoji equivalence score and an emoji relatedness score. Then we average the two values to calculate the final emoji similarity score for a given pair of emoji. We use the final emoji similarity score to evaluate our emoji embedding models.

max width= Ordinal Rating 0 1 2 3 4 Question Q1 Q2 Q1 Q2 Q1 Q2 Q1 Q2 Q1 Q2 1F4E2 1F648 00A9 1F499 1F64F 1F44D 1F445 1F4A6 1F60D 1F499 1F629 1F644 2764 1F618 1F48D 1F60D 1F3B5 1F3B6 1F3B5 1F3B6 Emoji Pairs 00A9 1F49B 1F3B6 1F525 0263A 1F618 1F631 1F602 1F602 1F61B 1F60D 1F44C 1F449 1F448 1F649 1F648 1F495 2764 0263A 1F60A with 1F4AF 1F649 1F602 1F525 1F629 1F644 1F644 1F602 1F3B6 1F4FB 1F61C 1F602 1F499 1F49A 1F64A 1F649 1F38A 1F389 1F38A 1F389 Highest Agreement 1F64T 1F618 1F4A6 1F449 1F60A 1F604 1F918 1F602 1F602 1F606 1F525 1F4A5 1F49C 1F618 1F498 1F60D 1F495 1F60D 2764 1F49E 1F608 1F459 1F334 1F60D 1F914 1F62D 1F643 1F602 1F499 1F618 1F389 1F496 1F496 1F618 1F3C6 26BD 0263A 1F60A 1F499 1F49A

Table 2: Top-5 Emoji Pairs with Highest Inter-annotator Agreement for Each Ordinal Value from 0 to 4

4.3 Annotation evaluation

We conducted a series of statistical tests to verify that EmoSim508 is a reliable dataset, that is, to ensure that the annotators did not randomly answer the task’s questions [1]. To verify this, we measured the inter-annotator agreement. Since we had ten annotators who used ordinal data to evaluate the similarity of emoji, we selected Krippendorff’s alpha coefficient to calculate the agreement among annotators [12]. We calculated annotator agreement for each question separately and observed an value of 0.632 for Q1 and an value of 0.567 for Q2. This tells us that the emoji similarity evaluation was not an easy task for the annotators and their agreement is slightly better when deciding on two emoji pairs for equivalence than relatedness. In our dataset, a lot of annotators have agreed on the non-equivalence of emoji pairs, thus, we believe that the slightly higher score for agreeing on the equivalence of an emoji pair could be a result of that.

Figure 3: Distribution of the Mean of User Ratings

To evaluate how reasonable are the scores provided by the human annotators, we look at the emoji pairs with highest inter-annotator agreement for each ordinal value in the Lickert scale (0 to 4) in Table 2. Here, we focus on annotator agreement at each level of the Lickert scale (0 to 4). We notice that all annotators have agreed that the 1F3B5 and 1F3B6 emoji show an equivalence relationship. All other emoji pairs shown for ordinal value 4, which stands for ‘equivalent or fully related’, show high agreement (a minimum of ) among the annotators. Ordinal value 3, which stands for ‘highly similar or closely related’, show medium agreement (a minimum of ) among annotators. Ordinal values 1 and 2, standing for ‘slightly similar or slightly related’ and ‘similar or related’, respectively, also show medium agreement (a minimum of ) among the top-5 emoji pairs for each ordinal value. Finally, ordinal value 0, which stands for ‘dissimilar or unrelated’, show full agreement () among annotators for a total of 184 emoji pairs. The annotators have unanimously agreed that there is no relatedness and equivalence relationships exist for a group 31 and 153 emoji pairs, respectively. This further shows that it has been easier for them to agree on the dissimilarity of a pair of emoji than on its similarity or relatedness.

Figure 3

depicts the distribution of the mean of the annotator ratings (line plot) and one standard deviation from the mean (ribbon plot) for each emoji pair for each question. For both questions, the mean of each plot shows a near-linear trend, proving that our dataset captures diverse types of relationships. Specifically, for question 1, we find a near-linear trend in the mean distribution for emoji pairs where the mean user rating is between 0.66 and 4. For question 2, we find a similar trend for emoji pairs where the mean rating is between 1 and 4. For both questions, the deviation bands are dense, especially in the range of 0.75 – 2.5, which is to be expected. We also note that the deviation does not span beyond one rating (e.g., the deviation bands at a mean of 2 tend to span between 1 and 3). This reasonable deviation further speaks for the diversity of responses. The size of these deviation bands decrease as we approach extreme values (i.e., emoji definitely similar and definitely different). We notice an elbow (from

to ) at the start of the mean distribution for Q1 due to the strong agreement among annotators for the unrelated emoji. This shows that even though we selected highly co-occurring emoji pairs from a Twitter corpus to be included in the EmoSim508 dataset, annotators have rated them as not related. However, we can also see that the unrelated emoji only cover 29.7% (153/508 for Q1) of the dataset, leaving 70.3% of the dataset with diverse relationships.

5 Evaluating Emoji Embedding Models

In this section, we discuss how we evaluated the different emoji embedding models using EmoSim508 as a gold standard dataset. We generated nine ranked lists of emoji pairs based on emoji similarity scores, one ranked list representing the EmoSim508 emoji similarity and eight ranked lists representing each emoji embedding model obtained under different corpus settings. Treating EmoSim508’s emoji similarity ranks as our ground truth emoji rankings, we use Spearman’s rank correlation coefficient131313https://goo.gl/ZA4zDP (Spearman’s ) to evaluate how well the emoji similarity rankings generated by our emoji embedding models align with the emoji similarity rankings of the gold standard dataset. We used Spearman’s

because we noticed that our emoji annotation distribution does not follow a normal distribution. The rank correlation obtained for each setting (multiplied by 100 for display purposes) is shown in Table 

3. Based on the rank correlation results, we notice that emoji embedding models learned over emoji descriptions moderately correlate () with the gold standard results, whereas all other models show a strong correlation (). All results reported in Table 3 are statistically significant ().

Emoji Embedding Model x 100 for each Corpus
Google News Twitter
(Sense_Desc.) 49.0 46.6
(Sense_Label) 76.0 70.2
(Sense_Def.) 69.5 66.9
(Sense_All) 71.2 67.7
Table 3: Spearman’s Rank Correlation Results

We observe that the emoji embeddings learned on sense labels correlate best with the emoji similarity rankings of the gold standard dataset. We further looked into what could be the reason for emoji sense labels-based embedding models (Sense_Label) to outperform other models. Past work suggests that having access to lengthy emoji sense definitions could improve the performance of the emoji embedding models [9, 30]. For the 158 emoji in EmoSim508 dataset, emoji meanings were represented using 25 words on average when using the emoji descriptions; 12 words when using the emoji sense labels; 567 words when using the emoji sense definitions; and 606 words when all above definitions were combined. All our emoji embedding definitions have more words (at least twice as many) than past work [9], but we notice that emoji sense labels are very specific words that only describe emoji meanings, unlike the words in emoji sense descriptions and emoji sense definitions. In contrast, emoji descriptions and emoji sense definitions provide more words describing how an emoji is shown on different platforms or how an emoji should be used in a sentence while describing the emoji’s meaning. These unrelated words in emoji definitions may well be the reason for degraded performance of (Sense_Desc.), (Sense_Def.) and (Sense_All) embeddings. Thus, access to quality sense labels are of vital importance for learning good emoji embeddings.

6 Emoji Embeddings at Work

To show that our emoji embedding models can be used in real-world NLP tasks141414Please note that our main goal is to demonstrate the usefulness of the learned embedding models and not to develop a state-of-the-art sentiment analysis algorithm., we set up a sentiment analysis experiment using the gold standard dataset used in [28]. We selected this dataset because Barbieri et al.’s [4] and Eisner et al.’s [9] models have already been evaluated on this dataset. Thus, it allows us to compare our embedding models with theirs. The gold standard dataset consists of nearly 66,000 English tweets, labelled manually for positive, neutral or negative sentiment. The dataset is divided into a testing set that consist of 51,679 tweets, where 11,700 of them contain emoji, and a training set that consist of 12,920 tweets with 2,295 of them contain emoji. In both the training set and the test set, 46% of tweets are labeled neutral, 29% are labeled positive, and 25% are labeled negative. Thus, the dataset is reasonably balanced.

Classification accuracy on testing dataset
Word Embedding Model N = 12,920 N = 2,295 N = 2,186 N = 308
Google News + emoji2vec 59.5 60.5 54.4 59.2 55.0 59.5 54.5 55.2
Google News + (Sense_Desc.) 58.7 61.9 50.6 55.0 49.7 55.3 45.4 50.0
Twitter + (Sense_Desc.) 60.2 62.5 55.1 56.7 53.8 57.3 53.5 53.2
Google News + (Sense_Label) 60.3 63.3 55.0 61.8 56.8 62.3 54.2 59.0
Twitter + (Sense_Label) 60.7 63.6 57.3 60.8 57.5 61.5 56.1 58.4
Google News + (Sense_Def.) 59.0 62.2 50.3 55.0 51.1 55.2 48.0 50.6
Twitter + (Sense_Def.) 60.0 62.4 53.6 56.2 53.7 56.7 50.6 50.6
Google News + (Sense_All) 59.1 62.2 50.8 55.1 50.2 55.3 50.0 50.6
Twitter + (Sense_All) 60.3 62.4 53.1 57.6 54.1 56.8 54.5 50.0
Table 4: Accuracy of the Sentiment Analysis task using Emoji Embeddings

To represent a training instance in our sentiment analysis dataset, we replaced every word in a tweet using the different embedding models learned for that word by using different text corpora. We also replaced every emoji in the tweet with its representation from a particular emoji embedding model we learned. Table 4 shows the results we obtained for the sentiment analysis task when using different emoji embeddings. Here, Google News + (Sense_Desc.) means that all words in the tweets in the gold standard dataset are replaced by their corresponding word embedding models learned by the Google News corpus and all emoji are replaced by their corresponding emoji embeddings obtained by the (Sense_Desc.)

model. We report classification accuracies for: (i) the whole testing dataset (N = 12,920); (ii) all tweets with emoji (N = 2,295); (iii) 90% of the most frequently used emoji in the test set (N = 2,186); and (iv) 10% of the least frequently used emoji in the test set (N = 308). We trained a Random Forrest (RF) classifier and a Support Vector Machine (SVM) classifier using each test data segment. We selected those two classifier models as they are commonly used for text classification problems, including the sentiment analysis experiment conducted by Eisner

et al. [9] on the same gold standard dataset. Table 4 summarizes the results obtained in the sentiment analysis task. Following Eisner et al. [9], we also report the accuracy of the sentiment analysis task, which allows us to compare our embedding models with theirs. Accuracy is measured in settings where the testing dataset is divided into four groups based on the availability of tweets with emoji in each group. We find that the embeddings learned over emoji sense labels perform best in the sentiment analysis task, outperforming the previous best emoji embedding model [9] with an improvement of 7.73%. This embedding model also yielded the best similarity ranking as per Spearman’s Rank Correlation test.

As discussed in Section 5, we believe that the inclusion of words that are highly related to emoji meanings make emoji embeddings over sense labels to learn better models to represent the meaning of an emoji, hence, outperform the other models in the sentiment analysis task. We also notice that Twitter-based emoji embedding models continue to outperform Google News-based embedding models in the majority of the test run settings. Past research on social media text processing suggests that NLP tools designed for social media text processing outperform NLP tools designed for well-formed text processing on the same task [38]. We believe this could be the reason why Twitter-based models continue to outperform Google News-based models. Our results, which continue to outperform Eisner et al.’s model [9], prove that the use of emoji descriptions, sense labels, and emoji definitions to model emoji meanings has resulted in learning better emoji embedding models.

7 Conclusion

This paper presented how semantic similarity of emoji can be calculated by utilizing the machine-readable emoji sense definitions. Using the emoji descriptions, emoji sense labels and emoji sense definitions extracted from EmojiNet, and using two different training corpora obtained from Twitter and Google News, we explored multiple emoji embedding models to measure emoji similarity. With the help of ten human annotators who are knowledgeable about emoji, we created EmoSim508 dataset, which consist of 508 emoji pairs and used it as the gold standard to evaluate how well our emoji embedding models perform in an emoji similarity calculation task. To show a real-world use-case of the learned emoji embedding models, we used them in a sentiment analysis task and presented the results. We released the EmoSim508 dataset and our emoji embedding models with our paper. This is the first effort that explored utilizing a machine-readable emoji sense inventory and distributional semantic models to learn emoji embeddings. In the future, we would like to extend our emoji embedding models to understand the differences in emoji interpretations due to how they appear across different platforms or devices. We would also like to apply our emoji embedding models to other emoji analysis tasks such as emoji-based search. Specifically, we would like to explore whether emoji similarity results could be used to improve the recall in emoji-based search applications.


We are thankful to the annotators who helped us in creating the EmoSim508 dataset. We acknowledge partial support from the National Institute on Drug Abuse (NIDA) Grant No. 5R01DA039454-03: “Trending: Social Media Analysis to Monitor Cannabis and Synthetic Cannabinoid Use”, and the National Science Foundation (NSF) award: CNS-1513721: “Context-Aware Harassment Detection on Social Media”. Points of view or opinions in this document are those of the authors and do not necessarily represent the official position or policies of the NIDA or NSF.


  • [1] Artstein, R., Poesio, M.: Inter-coder agreement for computational linguistics. Computational Linguistics 34(4), 555–596 (2008)
  • [2] Balasuriya, L., Wijeratne, S., Doran, D., Sheth, A.: Finding street gang members on twitter. In: The 2016 IEEE/ACM Intl. Conf. on Advances in Social Networks Analysis and Mining (ASONAM). vol. 8, pp. 685–692 (August 2016)
  • [3] Barbieri, F., Kruszewski, G., Ronzano, F., Saggion, H.: How cosmopolitan are emojis?: Exploring emojis usage and meaning over different languages with distributional semantics. In: Proc. of the 2016 ACM on Multimedia Conference (MM). pp. 531–535 (2016)
  • [4] Barbieri, F., Ronzano, F., Saggion, H.: What does this emoji mean? a vector space skip-gram model for twitter emojis. In: LREC (2016)
  • [5] Camacho-Collados, J., Pilehvar, M.T., Navigli, R.: Nasari: a novel approach to a semantically-aware representation of items. In: Human Language Technologies: The 2015 Annual Conf. of the North American Chapter of the ACL (HTL-NAACL). pp. 567–577 (2015)
  • [6] Cappallo, S., Mensink, T., Snoek, C.G.: Query-by-emoji video search. In: Proc. of the 23rd ACM Intl. Conf. on Multimedia. pp. 735–736 (2015)
  • [7] Cramer, H., de Juan, P., Tetreault, J.: Sender-intended functions of emojis in us messaging. In: Proc. of the 18th Intl. Conf. on Human-Computer Interaction with Mobile Devices and Services (MobileHCI). pp. 504–509 (2016)
  • [8]

    Dimson, T.: Emojineering part 1: Machine learning for emoji trends. Instagram Engineering Blog (2015)

  • [9] Eisner, B., Rocktäschel, T., Augenstein, I., Bošnjak, M., Riedel, S.: emoji2vec: Learning emoji representations from their description. In: Proc. of the 4th Intl. Workshop on NLP for Social Media at EMNLP (SocialNLP) (2016)
  • [10] Guha, R., McCool, R., Miller, E.: Semantic search. In: Proc. of the 12th Intl. Conf. on World Wide Web (WWW). pp. 700–709 (2003)
  • [11] Harris, Z.S.: Distributional structure. Word 10(2-3), 146–162 (1954)
  • [12] Hayes, A.F., Krippendorff, K.: Answering the call for a standard reliability measure for coding data. Communication methods and measures (2007)
  • [13]

    Hill, F., Reichart, R., Korhonen, A.: Simlex-999: Evaluating semantic models with (genuine) similarity estimation. In: Computational Linguistics

  • [14] Hu, Y., Talamadupula, K., Kambhampati, S.: Dude, srsly?: The surprisingly formal nature of twitter’s language. In: 7th Intl. AAAI Conf. on Web and Social Media (ICWSM). pp. 244–253. Boston, MA, USA (2013)
  • [15] Huang, E.H., Socher, R., Manning, C.D., Ng, A.Y.: Improving word representations via global context and multiple word prototypes. In: Proc. of the 50th Meeting of the Association for Computational Linguistics (ACL). pp. 873–882 (2012)
  • [16] Kelly, R., Watts, L.: Characterising the inventive appropriation of emoji as relationally meaningful in mediated close personal relationships. In: Experiences of Technology Appropriation (2015)
  • [17] Kenter, T., Borisov, A., de Rijke, M.: Siamese cbow: Optimizing word embeddings for sentence representations. In: Proc. of the 54th Meeting of the Association for Computational Linguistics (ACL). pp. 941–951 (2016)
  • [18] Levy, O., Goldberg, Y., Dagan, I.: Improving distributional similarity with lessons learned from word embeddings. Transactions of the Association for Computational Linguistics 3, 211–225 (2015)
  • [19] Likert, R.: A technique for the measurement of attitudes. Archives of psychology (1932)
  • [20] Ljubešic, N., Fišer, D.: A global analysis of emoji usage (2016)
  • [21] Maaten, L.v.d., Hinton, G.: Visualizing data using t-sne. Journal of Machine Learning Research 9(Nov), 2579–2605 (2008)
  • [22] Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space (2013), http://arxiv.org/abs/1301.3781
  • [23]

    Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems (NIPS). pp. 3111–3119 (2013)

  • [24] Mikolov, T., Yih, W.t., Zweig, G.: Linguistic regularities in continuous space word representations. In: Human Language Technologies: The 2015 Annual Conf. of the North American Chapter of the ACL (HTL-NAACL). vol. 13, pp. 746–751 (2013)
  • [25] Miller, H., Thebault-Spieker, J., Chang, S., Johnson, I., Terveen, L., Hecht, B.: “blissfully happy” or “ready to fight”: Varying interpretations of emoji. 10th Intl. AAAI Conf. on Web and Social Media (ICWSM) pp. 259–268 (2016)
  • [26] Navigli, R.: Word sense disambiguation: A survey. ACM Computing Surveys (CSUR) 41(2),  10 (2009)
  • [27] Navigli, R., Ponzetto, S.P.: Babelnet: Building a very large multilingual semantic network. In: Proc. of the 48th meeting of the Association for Computational Linguistics (ACL). pp. 216–225 (2010)
  • [28] Novak, P.K., Smailović, J., Sluban, B., Mozetič, I.: Sentiment of emojis. PloS one 10(12), e0144296 (2015)
  • [29] Pavalanathan, U., Eisenstein, J.: More emojis, less:) the competition for paralinguistic function in microblog writing (2016)
  • [30] Pohl, H., Domin, C., Rohs, M.: Beyond just text: Semantic emoji similarity modeling to support expressive communication. ACM Transactions on Computer-Human Interaction (TOCHI) 24(1),  6 (2017)
  • [31] Revilla, M.A., Saris, W.E., Krosnick, J.A.: Choosing the number of categories in agree–disagree scales. In: Sociological Methods & Research (2014)
  • [32] Snow, R., O’Connor, B., Jurafsky, D., Ng, A.Y.: Cheap and fast—but is it good?: evaluating non-expert annotations for natural language tasks. In: Proc. of the Conf. on Empirical Methods in Natural Language Processing (EMNLP). pp. 254–263 (2008)
  • [33] Tigwell, G.W., Flatla, D.R.: Oh that’s what you meant!: reducing emoji misunderstanding. In: Proc. of the 18th Intl. Conf. on Human-Computer Interaction with Mobile Devices and Services Adjunct. pp. 859–866 (2016)
  • [34] Wang, W., Chen, L., Thirunarayan, K., Sheth, A.P.: Harnessing twitter" big data" for automatic emotion identification. In: Privacy, Security, Risk and Trust, 2012 Intl. Conf. on Social Computing (SocialCom). pp. 587–592 (2012)
  • [35] Weaver, W.: Translation. Machine translation of languages 14 (1955)
  • [36] Wijeratne, S., Balasuriya, L., Doran, D., Sheth, A.: Word embeddings to enhance twitter gang member profile identification. In: IJCAI Workshop on Semantic Machine Learning (SML). pp. 18–24. New York City (07 2016)
  • [37] Wijeratne, S., Balasuriya, L., Sheth, A., Doran, D.: Emojinet: Building a machine readable sense inventory for emoji. In: 8th Intl. Conf. on Social Informatics (SocInfo). pp. 527–541. Bellevue, WA, USA (November 2016)
  • [38] Wijeratne, S., Balasuriya, L., Sheth, A., Doran, D.: Emojinet: An open service and api for emoji sense discovery. In: 11th Intl. AAAI Conf. on Web and Social Media (ICWSM). pp. 437–446. Montreal, Canada (May 2017)