FeelsGoodMan: Inferring Semantics of Twitch Neologisms

08/18/2021
by   Pavel Dolin, et al.
0

Twitch chats pose a unique problem in natural language understanding due to a large presence of neologisms, specifically emotes. There are a total of 8.06 million emotes, over 400k of which were used in the week studied. There is virtually no information on the meaning or sentiment of emotes, and with a constant influx of new emotes and drift in their frequencies, it becomes impossible to maintain an updated manually-labeled dataset. Our paper makes a two fold contribution. First we establish a new baseline for sentiment analysis on Twitch data, outperforming the previous supervised benchmark by 7.9 Secondly, we introduce a simple but powerful unsupervised framework based on word embeddings and k-NN to enrich existing models with out-of-vocabulary knowledge. This framework allows us to auto-generate a pseudo-dictionary of emotes and we show that we can nearly match the supervised benchmark above even when injecting such emote knowledge into sentiment classifiers trained on extraneous datasets such as movie reviews or Twitter.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/06/2020

Quality of Word Embeddings on Sentiment Analysis Tasks

Word embeddings or distributed representations of words are being used i...
research
02/02/2019

Word Embeddings for Sentiment Analysis: A Comprehensive Empirical Survey

This work investigates the role of factors like training method, trainin...
research
12/30/2020

Out of Order: How important is the sequential order of words in a sentence in Natural Language Understanding tasks?

Do state-of-the-art natural language understanding models care about wor...
research
01/17/2023

Word Embeddings as Statistical Estimators

Word embeddings are a fundamental tool in natural language processing. C...
research
03/27/2021

Unsupervised Self-Training for Sentiment Analysis of Code-Switched Data

Sentiment analysis is an important task in understanding social media co...
research
10/18/2021

SentimentArcs: A Novel Method for Self-Supervised Sentiment Analysis of Time Series Shows SOTA Transformers Can Struggle Finding Narrative Arcs

SOTA Transformer and DNN short text sentiment classifiers report over 97...

Please sign up or login with your details

Forgot password? Click here to reset