Unsupervised Self-Training for Sentiment Analysis of Code-Switched Data

03/27/2021
by   Akshat Gupta, et al.
0

Sentiment analysis is an important task in understanding social media content like customer reviews, Twitter and Facebook feeds etc. In multilingual communities around the world, a large amount of social media text is characterized by the presence of Code-Switching. Thus, it has become important to build models that can handle code-switched data. However, annotated code-switched data is scarce and there is a need for unsupervised models and algorithms. We propose a general framework called Unsupervised Self-Training and show its applications for the specific use case of sentiment analysis of code-switched data. We use the power of pre-trained BERT models for initialization and fine-tune them in an unsupervised manner, only using pseudo labels produced by zero-shot transfer. We test our algorithm on multiple code-switched languages and provide a detailed analysis of the learning dynamics of the algorithm with the aim of answering the question - `Does our unsupervised model understand the Code-Switched languages or does it just learn its representations?'. Our unsupervised models compete well with their supervised counterparts, with their performance reaching within 1-7% (weighted F1 scores) when compared to supervised models trained for a two class problem.

READ FULL TEXT

page 3

page 7

research
11/15/2021

IIITT@Dravidian-CodeMix-FIRE2021: Transliterate or translate? Sentiment analysis of code-mixed text in Dravidian languages

Sentiment analysis of social media posts and comments for various market...
research
06/08/2020

CS-Embed-francesita at SemEval-2020 Task 9: The effectiveness of code-switched word embeddings for sentiment analysis

The growing popularity and applications of sentiment analysis of social ...
research
12/06/2021

Zero-shot hashtag segmentation for multilingual sentiment analysis

Hashtag segmentation, also known as hashtag decomposition, is a common s...
research
01/20/2020

Unsupervised Sentiment Analysis for Code-mixed Data

Code-mixing is the practice of alternating between two or more languages...
research
12/15/2016

A Simple Approach to Multilingual Polarity Classification in Twitter

Recently, sentiment analysis has received a lot of attention due to the ...
research
11/03/2021

A Case Study and Qualitative Analysis of Simple Cross-Lingual Opinion Mining

User-generated content from social media is produced in many languages, ...
research
08/18/2021

FeelsGoodMan: Inferring Semantics of Twitch Neologisms

Twitch chats pose a unique problem in natural language understanding due...

Please sign up or login with your details

Forgot password? Click here to reset