Language Identification in Code-Mixed Data using Multichannel Neural Networks and Context Capture

08/21/2018
by   Soumil Mandal, et al.
0

An accurate language identification tool is an absolute necessity for building complex NLP systems to be used on code-mixed data. Lot of work has been recently done on the same, but there's still room for improvement. Inspired from the recent advancements in neural network architectures for computer vision tasks, we have implemented multichannel neural networks combining CNN and LSTM for word level language identification of code-mixed data. Combining this with a Bi-LSTM-CRF context capture module, accuracies of 93.28

READ FULL TEXT
research
10/16/2018

Strategies for Language Identification in Code-Mixed Low Resource Languages

In the recent years, substantial work has been done on language tagging ...
research
03/10/2018

Language Identification of Bengali-English Code-Mixed data using Character & Phonetic based LSTM Models

Language identification of social media text still remains a challenging...
research
10/09/2020

Word Level Language Identification in English Telugu Code Mixed Data

In a multilingual or sociolingual configuration Intra-sentential Code Sw...
research
11/23/2020

Evaluating Input Representation for Language Identification in Hindi-English Code Mixed Text

Natural language processing (NLP) techniques have become mainstream in t...
research
05/22/2018

Normalization of Transliterated Words in Code-Mixed Data Using Seq2Seq Model & Levenshtein Distance

Building tools for code-mixed data is rapidly gaining popularity in the ...
research
02/11/2021

A reproduction of Apple's bi-directional LSTM models for language identification in short strings

Language Identification is the task of identifying a document's language...
research
11/14/2018

Melodic Phrase Segmentation By Deep Neural Networks

Automated melodic phrase detection and segmentation is a classical task ...

Please sign up or login with your details

Forgot password? Click here to reset