Harnessing Code Switching to Transcend the Linguistic Barrier

01/30/2020
by   Ashiqur R. KhudaBukhsh, et al.
0

Code mixing (or code switching) is a common phenomenon observed in social-media content generated by a linguistically diverse user-base. Studies show that in the Indian sub-continent, a substantial fraction of social media posts exhibit code switching. While the difficulties posed by code mixed documents to further downstream analyses are well-understood, lending visibility to code mixed documents under certain scenarios may have utility that has been previously overlooked. For instance, a document written in a mixture of multiple languages can be partially accessible to a wider audience; this could be particularly useful if a considerable fraction of the audience lacks fluency in one of the component languages. In this paper, we provide a systematic approach to sample code mixed documents leveraging a polyglot embedding based method that requires minimal supervision. In the context of the 2019 India-Pakistan conflict triggered by the Pulwama terror attack, we demonstrate an untapped potential of harnessing code mixing for human well-being: starting from an existing hostility diffusing hope speech classifier solely trained on English documents, code mixed documents are utilized as a bridge to retrieve hope speech content written in a low-resource but widely used language - Romanized Hindi. Our proposed pipeline requires minimal supervision and holds promise in substantially reducing web moderation efforts.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/18/2018

Sentiment Analysis of Code-Mixed Indian Languages: An Overview of SAIL_Code-Mixed Shared Task @ICON-2017

Sentiment analysis is essential in many real-world applications such as ...
research
09/07/2020

NLP-CIC at SemEval-2020 Task 9: Analysing sentiment in code-switching language using a simple deep-learning classifier

Code-switching is a phenomenon in which two or more languages are used i...
research
07/24/2021

MIPE: A Metric Independent Pipeline for Effective Code-Mixed NLG Evaluation

Code-mixing is a phenomenon of mixing words and phrases from two or more...
research
03/15/2017

Is this word borrowed? An automatic approach to quantify the likeliness of borrowing in social media

Code-mixing or code-switching are the effortless phenomena of natural sw...
research
04/20/2020

PHINC: A Parallel Hinglish Social Media Code-Mixed Corpus for Machine Translation

Code-mixing is the phenomenon of using more than one language in a sente...
research
11/13/2019

Prevalence of code mixing in semi-formal patient communication in low resource languages of South Africa

In this paper we address the problem of code-mixing in resource-poor lan...
research
07/04/2017

Complexity Metric for Code-Mixed Social Media Text

An evaluation metric is an absolute necessity for measuring the performa...

Please sign up or login with your details

Forgot password? Click here to reset