Progressive Sentiment Analysis for Code-Switched Text Data

10/25/2022
by   Sudhanshu Ranjan, et al.
0

Multilingual transformer language models have recently attracted much attention from researchers and are used in cross-lingual transfer learning for many NLP tasks such as text classification and named entity recognition. However, similar methods for transfer learning from monolingual text to code-switched text have not been extensively explored mainly due to the following challenges: (1) Code-switched corpus, unlike monolingual corpus, consists of more than one language and existing methods can't be applied efficiently, (2) Code-switched corpus is usually made of resource-rich and low-resource languages and upon using multilingual pre-trained language models, the final model might bias towards resource-rich language. In this paper, we focus on code-switched sentiment analysis where we have a labelled resource-rich language dataset and unlabelled code-switched data. We propose a framework that takes the distinction between resource-rich and low-resource language into account. Instead of training on the entire code-switched corpus at once, we create buckets based on the fraction of words in the resource-rich language and progressively train from resource-rich language dominated samples to low-resource language dominated samples. Extensive experiments across multiple language pairs demonstrate that progressive training helps low-resource language dominated samples.

READ FULL TEXT
research
04/16/2021

MetaXL: Meta Representation Transformation for Low-resource Cross-lingual Learning

The combination of multilingual pre-trained representations and cross-li...
research
06/13/2020

Transferring Monolingual Model to Low-Resource Language: The Case of Tigrinya

In recent years, transformer models have achieved great success in natur...
research
05/21/2022

Pre-training Data Quality and Quantity for a Low-Resource Language: New Corpus and BERT Models for Maltese

Multilingual language models such as mBERT have seen impressive cross-li...
research
05/06/2021

On the logistical difficulties and findings of Jopara Sentiment Analysis

This paper addresses the problem of sentiment analysis for Jopara, a cod...
research
09/14/2022

Language Chameleon: Transformation analysis between languages using Cross-lingual Post-training based on Pre-trained language models

As pre-trained language models become more resource-demanding, the inequ...
research
06/13/2019

Improved Sentiment Detection via Label Transfer from Monolingual to Synthetic Code-Switched Text

Multilingual writers and speakers often alternate between two languages ...
research
06/07/2018

Semi-supervised and Transfer learning approaches for low resource sentiment classification

Sentiment classification involves quantifying the affective reaction of ...

Please sign up or login with your details

Forgot password? Click here to reset