Preparing Bengali-English Code-Mixed Corpus for Sentiment Analysis of Indian Languages

03/11/2018
by   Soumil Mandal, et al.
0

Analysis of informative contents and sentiments of social users has been attempted quite intensively in the recent past. Most of the systems are usable only for monolingual data and fails or gives poor results when used on data with code-mixing property. To gather attention and encourage researchers to work on this crisis, we prepared gold standard Bengali-English code-mixed data with language and polarity tag for sentiment analysis purposes. In this paper, we discuss the systems we prepared to collect and filter raw Twitter data. In order to reduce manual work while annotation, hybrid systems combining rule based and supervised models were developed for both language and sentiment tagging. The final corpus was annotated by a group of annotators following a few guidelines. The gold standard corpus thus obtained has impressive inter-annotator agreement obtained in terms of Kappa values. Various metrics like Code-Mixed Index (CMI), Code-Mixed Factor (CF) along with various aspects (language and emotion) also qualitatively polled the code-mixed and sentiment properties of the corpus.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/30/2020

A Sentiment Analysis Dataset for Code-Mixed Malayalam-English

There is an increasing demand for sentiment analysis of text from social...
research
12/14/2014

Recurrent-Neural-Network for Language Detection on Twitter Code-Switching Corpus

Mixed language data is one of the difficult yet less explored domains of...
research
05/30/2020

Corpus Creation for Sentiment Analysis in Code-Mixed Tamil-English Text

Understanding the sentiment of a comment from a video or an image is an ...
research
01/22/2021

CMSAOne@Dravidian-CodeMix-FIRE2020: A Meta Embedding and Transformer model for Code-Mixed Sentiment Analysis on Social Media Text

Code-mixing(CM) is a frequently observed phenomenon that uses multiple l...
research
02/25/2021

Sentiment Analysis of Persian-English Code-mixed Texts

The rapid production of data on the internet and the need to understand ...
research
11/18/2021

Findings of the Sentiment Analysis of Dravidian Languages in Code-Mixed Text

We present the results of the Dravidian-CodeMix shared task held at FIRE...
research
03/27/2020

Semantic Enrichment of Nigerian Pidgin English for Contextual Sentiment Classification

Nigerian English adaptation, Pidgin, has evolved over the years through ...

Please sign up or login with your details

Forgot password? Click here to reset