DeepAI AI Chat
Log In Sign Up

A reproduction of Apple's bi-directional LSTM models for language identification in short strings

by   Mads Toftrup, et al.

Language Identification is the task of identifying a document's language. For applications like automatic spell checker selection, language identification must use very short strings such as text message fragments. In this work, we reproduce a language identification architecture that Apple briefly sketched in a blog post. We confirm the bi-LSTM model's performance and find that it outperforms current open-source language identifiers. We further find that its language identification mistakes are due to confusion between related languages.


page 3

page 4

page 5


Automatic Language Identification System for Hindi and Magahi

Language identification has become a prerequisite for all kinds of autom...

Language Identification of Devanagari Poems

Language Identification is a very important part of several text process...

Short Text Language Identification for Under Resourced Languages

The paper presents a hierarchical naive Bayesian and lexicon based class...

LanideNN: Multilingual Language Identification on Character Window

In language identification, a common first step in natural language proc...

Language Identification in Code-Mixed Data using Multichannel Neural Networks and Context Capture

An accurate language identification tool is an absolute necessity for bu...

LIDE: Language Identification from Text Documents

The increase in the use of microblogging came along with the rapid growt...

Language Identification on Massive Datasets of Short Message using an Attention Mechanism CNN

Language Identification (LID) is a challenging task, especially when the...