DeepAI AI Chat
Log In Sign Up

A reproduction of Apple's bi-directional LSTM models for language identification in short strings

02/11/2021
by   Mads Toftrup, et al.
0

Language Identification is the task of identifying a document's language. For applications like automatic spell checker selection, language identification must use very short strings such as text message fragments. In this work, we reproduce a language identification architecture that Apple briefly sketched in a blog post. We confirm the bi-LSTM model's performance and find that it outperforms current open-source language identifiers. We further find that its language identification mistakes are due to confusion between related languages.

READ FULL TEXT

page 3

page 4

page 5

04/13/2018

Automatic Language Identification System for Hindi and Magahi

Language identification has become a prerequisite for all kinds of autom...
12/30/2020

Language Identification of Devanagari Poems

Language Identification is a very important part of several text process...
11/18/2019

Short Text Language Identification for Under Resourced Languages

The paper presents a hierarchical naive Bayesian and lexicon based class...
01/12/2017

LanideNN: Multilingual Language Identification on Character Window

In language identification, a common first step in natural language proc...
08/21/2018

Language Identification in Code-Mixed Data using Multichannel Neural Networks and Context Capture

An accurate language identification tool is an absolute necessity for bu...
01/13/2017

LIDE: Language Identification from Text Documents

The increase in the use of microblogging came along with the rapid growt...
10/15/2019

Language Identification on Massive Datasets of Short Message using an Attention Mechanism CNN

Language Identification (LID) is a challenging task, especially when the...