Recurrent Neural Network based Part-of-Speech Tagger for Code-Mixed Social Media Text

11/15/2016
by   Raj Nath Patel, et al.
0

This paper describes Centre for Development of Advanced Computing's (CDACM) submission to the shared task-'Tool Contest on POS tagging for Code-Mixed Indian Social Media (Facebook, Twitter, and Whatsapp) Text', collocated with ICON-2016. The shared task was to predict Part of Speech (POS) tag at word level for a given text. The code-mixed text is generated mostly on social media by multilingual users. The presence of the multilingual words, transliterations, and spelling variations make such content linguistically complex. In this paper, we propose an approach to POS tag code-mixed social media text using Recurrent Neural Network Language Model (RNN-LM) architecture. We submitted the results for Hindi-English (hi-en), Bengali-English (bn-en), and Telugu-English (te-en) code-mixed data.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/31/2016

Experiments with POS Tagging Code-mixed Indian Social Media Text

This paper presents Centre for Development of Advanced Computing Mumbai'...
research
02/01/2017

SMPOST: Parts of Speech Tagger for Code-Mixed Indic Social Media Text

Use of social media has grown dramatically during the last few years. Us...
research
04/03/2018

Automatic Normalization of Word Variations in Code-Mixed Social Media Text

Social media platforms such as Twitter and Facebook are becoming popular...
research
04/16/2019

UTFPR at SemEval-2019 Task 5: Hate Speech Identification with Recurrent Neural Networks

In this paper we revisit the problem of automatically identifying hate s...
research
04/11/2016

Shallow Parsing Pipeline for Hindi-English Code-Mixed Social Media Text

In this study, the problem of shallow parsing of Hindi-English code-mixe...
research
09/15/2020

Improving Joint Layer RNN based Keyphrase Extraction by Using Syntactical Features

Keyphrase extraction as a task to identify important words or phrases fr...
research
05/22/2018

Normalization of Transliterated Words in Code-Mixed Data Using Seq2Seq Model & Levenshtein Distance

Building tools for code-mixed data is rapidly gaining popularity in the ...

Please sign up or login with your details

Forgot password? Click here to reset