DeepAI AI Chat
Log In Sign Up

LIDE: Language Identification from Text Documents

by   Priyank Mathur, et al.

The increase in the use of microblogging came along with the rapid growth on short linguistic data. On the other hand deep learning is considered to be the new frontier to extract meaningful information out of large amount of raw data in an automated manner. In this study, we engaged these two emerging fields to come up with a robust language identifier on demand, namely Language Identification Engine (LIDE). As a result, we achieved 95.12 Discriminating between Similar Languages (DSL) Shared Task 2015 dataset, which is comparable to the maximum reported accuracy of 95.54


Short Text Language Identification for Under Resourced Languages

The paper presents a hierarchical naive Bayesian and lexicon based class...

Automatic Language Identification System for Hindi and Magahi

Language identification has become a prerequisite for all kinds of autom...

Uralic Language Identification (ULI) 2020 shared task dataset and the Wanca 2017 corpus

This article introduces the Wanca 2017 corpus of texts crawled from the ...

Albanian Language Identification in Text Documents

In this work we investigate the accuracy of standard and state-of-the-ar...

A reproduction of Apple's bi-directional LSTM models for language identification in short strings

Language Identification is the task of identifying a document's language...

Language Model Adaptation for Language and Dialect Identification of Text

This article describes an unsupervised language model adaptation approac...

Language identification as improvement for lip-based biometric visual systems

Language has always been one of humanity's defining characteristics. Vis...