On the logistical difficulties and findings of Jopara Sentiment Analysis

This paper addresses the problem of sentiment analysis for Jopara, a code-switching language between Guarani and Spanish. We first collect a corpus of Guarani-dominant tweets and discuss on the difficulties of finding quality data for even relatively easy-to-annotate tasks, such as sentiment analysis. Then, we train a set of neural models, including pre-trained language models, and explore whether they perform better than traditional machine learning ones in this low-resource setup. Transformer architectures obtain the best results, despite not considering Guarani during pre-training, but traditional machine learning models perform close due to the low-resource nature of the problem.

READ FULL TEXT
research
01/20/2022

NaijaSenti: A Nigerian Twitter Sentiment Corpus for Multilingual Sentiment Analysis

Sentiment analysis is one of the most widely studied applications in NLP...
research
02/15/2021

A Koopman Approach to Understanding Sequence Neural Models

We introduce a new approach to understanding trained sequence neural mod...
research
10/25/2022

Progressive Sentiment Analysis for Code-Switched Text Data

Multilingual transformer language models have recently attracted much at...
research
08/03/2023

Efficient Sentiment Analysis: A Resource-Aware Evaluation of Feature Extraction Techniques, Ensembling, and Deep Learning Models

While reaching for NLP systems that maximize accuracy, other important m...
research
04/04/2017

Interpretation of Semantic Tweet Representations

Research in analysis of microblogging platforms is experiencing a renewe...
research
11/18/2021

How Emotionally Stable is ALBERT? Testing Robustness with Stochastic Weight Averaging on a Sentiment Analysis Task

Despite their success, modern language models are fragile. Even small ch...

Please sign up or login with your details

Forgot password? Click here to reset