DeepAI AI Chat
Log In Sign Up

AfroLID: A Neural Language Identification Tool for African Languages

by   Ife Adebara, et al.

Language identification (LID) is a crucial precursor for NLP, especially for mining web data. Problematically, most of the world's 7000+ languages today are not covered by LID technologies. We address this pressing issue for Africa by introducing AfroLID, a neural LID toolkit for 517 African languages and varieties. AfroLID exploits a multi-domain web dataset manually curated from across 14 language families utilizing five orthographic systems. When evaluated on our blind Test set, AfroLID achieves 95.89 F_1-score. We also compare AfroLID to five existing LID tools that each cover a small number of African languages, finding it to outperform them on most languages. We further show the utility of AfroLID in the wild by testing it on the acutely under-served Twitter domain. Finally, we offer a number of controlled case studies and perform a linguistically-motivated error analysis that allow us to both showcase AfroLID's powerful capabilities and limitations.


page 14

page 20

page 21


Automatic Language Identification System for Hindi and Magahi

Language identification has become a prerequisite for all kinds of autom...

Discriminating Between Similar Nordic Languages

Automatic language identification is a challenging problem. Discriminati...

Open-Set Language Identification

We present the first open-set language identification experiments using ...

Comparative Study Of Data Mining Query Languages

Since formulation of Inductive Database (IDB) problem, several Data Mini...

SIGMORPHON 2020 Shared Task 0: Typologically Diverse Morphological Inflection

A broad goal in natural language processing (NLP) is to develop a system...

Tuplemax Loss for Language Identification

In many scenarios of a language identification task, the user will speci...

Bootstrapping Techniques for Polysynthetic Morphological Analysis

Polysynthetic languages have exceptionally large and sparse vocabularies...