Quality of Word Embeddings on Sentiment Analysis Tasks

03/06/2020
by   Erion Çano, et al.
0

Word embeddings or distributed representations of words are being used in various applications like machine translation, sentiment analysis, topic identification etc. Quality of word embeddings and performance of their applications depends on several factors like training method, corpus size and relevance etc. In this study we compare performance of a dozen of pretrained word embedding models on lyrics sentiment analysis and movie review polarity tasks. According to our results, Twitter Tweets is the best on lyrics sentiment analysis, whereas Google News and Common Crawl are the top performers on movie polarity analysis. Glove trained models slightly outrun those trained with Skipgram. Also, factors like topic relevance and size of corpus significantly impact the quality of the models. When medium or large-sized text sets are available, obtaining word embeddings from same training dataset is usually the best choice.

READ FULL TEXT
research
02/02/2019

Word Embeddings for Sentiment Analysis: A Comprehensive Empirical Survey

This work investigates the role of factors like training method, trainin...
research
11/23/2017

Improving the Accuracy of Pre-trained Word Embeddings for Sentiment Analysis

Sentiment analysis is one of the well-known tasks and fast growing resea...
research
08/14/2017

Data Sets: Word Embeddings Learned from Tweets and General Data

A word embedding is a low-dimensional, dense and real- valued vector rep...
research
08/18/2021

FeelsGoodMan: Inferring Semantics of Twitch Neologisms

Twitch chats pose a unique problem in natural language understanding due...
research
09/12/2022

emojiSpace: Spatial Representation of Emojis

In the absence of nonverbal cues during messaging communication, users e...
research
09/25/2019

Atalaya at TASS 2019: Data Augmentation and Robust Embeddings for Sentiment Analysis

In this article we describe our participation in TASS 2019, a shared tas...
research
11/04/2016

Automated Generation of Multilingual Clusters for the Evaluation of Distributed Representations

We propose a language-agnostic way of automatically generating sets of s...

Please sign up or login with your details

Forgot password? Click here to reset