Improved Sentiment Detection via Label Transfer from Monolingual to Synthetic Code-Switched Text

06/13/2019
by   Bidisha Samanta, et al.
0

Multilingual writers and speakers often alternate between two languages in a single discourse, a practice called "code-switching". Existing sentiment detection methods are usually trained on sentiment-labeled monolingual text. Manually labeled code-switched text, especially involving minority languages, is extremely rare. Consequently, the best monolingual methods perform relatively poorly on code-switched text. We present an effective technique for synthesizing labeled code-switched text from labeled monolingual text, which is more readily available. The idea is to replace carefully selected subtrees of constituency parses of sentences in the resource-rich language with suitable token spans selected from automatic translations to the resource-poor language. By augmenting scarce human-labeled code-switched text with plentiful synthetic code-switched text, we achieve significant improvements in sentiment labeling accuracy (1.5 (English-Hindi, English-Spanish and English-Bengali). We also get significant gains for hate speech detection: 4 6

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/21/2019

A Deep Generative Model for Code-Switched Text

Code-switching, the interleaving of two or more languages within a sente...
research
05/31/2023

Simple yet Effective Code-Switching Language Identification with Multitask Pre-Training and Transfer Learning

Code-switching, also called code-mixing, is the linguistics phenomenon w...
research
01/04/2020

Adapting Deep Learning for Sentiment Classification of Code-Switched Informal Short Text

Nowadays, an abundance of short text is being generated that uses nonsta...
research
03/27/2020

Semantic Enrichment of Nigerian Pidgin English for Contextual Sentiment Classification

Nigerian English adaptation, Pidgin, has evolved over the years through ...
research
10/25/2022

Progressive Sentiment Analysis for Code-Switched Text Data

Multilingual transformer language models have recently attracted much at...
research
11/03/2020

Towards Code-switched Classification Exploiting Constituent Language Resources

Code-switching is a commonly observed communicative phenomenon denoting ...
research
11/28/2018

GIRNet: Interleaved Multi-Task Recurrent State Sequence Models

In several natural language tasks, labeled sequences are available in se...

Please sign up or login with your details

Forgot password? Click here to reset