A reproduction of Apple's bi-directional LSTM models for language identification in short strings

02/11/2021
by   Mads Toftrup, et al.
0

Language Identification is the task of identifying a document's language. For applications like automatic spell checker selection, language identification must use very short strings such as text message fragments. In this work, we reproduce a language identification architecture that Apple briefly sketched in a blog post. We confirm the bi-LSTM model's performance and find that it outperforms current open-source language identifiers. We further find that its language identification mistakes are due to confusion between related languages.

READ FULL TEXT

page 3

page 4

page 5

research
04/13/2018

Automatic Language Identification System for Hindi and Magahi

Language identification has become a prerequisite for all kinds of autom...
research
12/30/2020

Language Identification of Devanagari Poems

Language Identification is a very important part of several text process...
research
11/18/2019

Short Text Language Identification for Under Resourced Languages

The paper presents a hierarchical naive Bayesian and lexicon based class...
research
11/01/2017

Improved Text Language Identification for the South African Languages

Virtual assistants and text chatbots have recently been gaining populari...
research
06/17/2023

Multilingual Multiword Expression Identification Using Lateral Inhibition and Domain Adaptation

Correctly identifying multiword expressions (MWEs) is an important task ...
research
01/13/2017

LIDE: Language Identification from Text Documents

The increase in the use of microblogging came along with the rapid growt...
research
08/21/2018

Language Identification in Code-Mixed Data using Multichannel Neural Networks and Context Capture

An accurate language identification tool is an absolute necessity for bu...

Please sign up or login with your details

Forgot password? Click here to reset