AI4D – African Language Dataset Challenge

by   Kathleen Siminyu, et al.

As language and speech technologies become more advanced, the lack of fundamental digital resources for African languages, such as data, spell checkers and Part of Speech taggers, means that the digital divide between these languages and others keeps growing. This work details the organisation of the AI4D - African Language Dataset Challenge, an effort to incentivize the creation, organization and discovery of African language datasets through a competitive challenge. We particularly encouraged the submission of annotated datasets which can be used for training task-specific supervised machine learning models.


page 5

page 6

page 7

page 8


AI4D – African Language Program

Advances in speech and language technologies enable tools such as voice-...

A Very Low Resource Language Speech Corpus for Computational Language Documentation Experiments

Most speech and language technologies are trained with massive amounts o...

Challenge Dataset of Cognates and False Friend Pairs from Indian Languages

Cognates are present in multiple variants of the same text across differ...

The JHU Speech LOREHLT 2017 System: Cross-Language Transfer for Situation-Frame Detection

We describe the system our team used during NIST's LoReHLT (Low Resource...

AlloVera: A Multilingual Allophone Database

We introduce a new resource, AlloVera, which provides mappings from 218 ...

Dim Wihl Gat Tun: The Case for Linguistic Expertise in NLP for Underdocumented Languages

Recent progress in NLP is driven by pretrained models leveraging massive...

AutoSpeech 2020: The Second Automated Machine Learning Challenge for Speech Classification

The AutoSpeech challenge calls for automated machine learning (AutoML) s...