Normalization of Transliterated Words in Code-Mixed Data Using Seq2Seq Model & Levenshtein Distance

05/22/2018
by   Soumil Mandal, et al.
0

Building tools for code-mixed data is rapidly gaining popularity in the NLP research community as such data is exponentially rising on social media. Working with code-mixed data contains several challenges, especially due to grammatical inconsistencies and spelling variations in addition to all the previous known challenges for social media scenarios. In this article, we present a novel architecture focusing on normalizing phonetic typing variations, which is commonly seen in code-mixed data. One of the main features of our architecture is that in addition to normalizing, it can also be utilized for back-transliteration and word identification in some cases. Our model achieved an accuracy of 90.27

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/03/2018

Automatic Normalization of Word Variations in Code-Mixed Social Media Text

Social media platforms such as Twitter and Facebook are becoming popular...
research
11/15/2016

Recurrent Neural Network based Part-of-Speech Tagger for Code-Mixed Social Media Text

This paper describes Centre for Development of Advanced Computing's (CDA...
research
06/14/2018

Gender Prediction in English-Hindi Code-Mixed Social Media Content : Corpus and Baseline System

The rapid expansion in the usage of social media networking sites leads ...
research
03/10/2018

Language Identification of Bengali-English Code-Mixed data using Character & Phonetic based LSTM Models

Language identification of social media text still remains a challenging...
research
10/16/2018

Strategies for Language Identification in Code-Mixed Low Resource Languages

In the recent years, substantial work has been done on language tagging ...
research
08/21/2018

Language Identification in Code-Mixed Data using Multichannel Neural Networks and Context Capture

An accurate language identification tool is an absolute necessity for bu...
research
04/11/2016

Shallow Parsing Pipeline for Hindi-English Code-Mixed Social Media Text

In this study, the problem of shallow parsing of Hindi-English code-mixe...

Please sign up or login with your details

Forgot password? Click here to reset