Strategies for Language Identification in Code-Mixed Low Resource Languages

10/16/2018
by   Soumil Mandal, et al.
0

In the recent years, substantial work has been done on language tagging of code-mixed data, but most of them use large amounts of data to build their models. In this article, we present three strategies for building a word level language tagger for code-mixed data using very low resources. Each of them secured an accuracy higher than our baseline model, and the best performing system got an accuracy around 91 an accuracy around 92.6

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/10/2018

Language Identification of Bengali-English Code-Mixed data using Character & Phonetic based LSTM Models

Language identification of social media text still remains a challenging...
research
08/21/2018

Language Identification in Code-Mixed Data using Multichannel Neural Networks and Context Capture

An accurate language identification tool is an absolute necessity for bu...
research
06/08/2023

Leveraging Language Identification to Enhance Code-Mixed Text Classification

The usage of more than one language in the same text is referred to as C...
research
03/23/2023

Prompting Multilingual Large Language Models to Generate Code-Mixed Texts: The Case of South East Asian Languages

While code-mixing is a common linguistic practice in many parts of the w...
research
05/22/2018

Normalization of Transliterated Words in Code-Mixed Data Using Seq2Seq Model & Levenshtein Distance

Building tools for code-mixed data is rapidly gaining popularity in the ...
research
04/03/2019

Subword-Level Language Identification for Intra-Word Code-Switching

Language identification for code-switching (CS), the phenomenon of alter...
research
04/09/2015

Leveraging Twitter for Low-Resource Conversational Speech Language Modeling

In applications involving conversational speech, data sparsity is a limi...

Please sign up or login with your details

Forgot password? Click here to reset