AI4D – African Language Program

by   Kathleen Siminyu, et al.

Advances in speech and language technologies enable tools such as voice-search, text-to-speech, speech recognition and machine translation. These are however only available for high resource languages like English, French or Chinese. Without foundational digital resources for African languages, which are considered low-resource in the digital context, these advanced tools remain out of reach. This work details the AI4D - African Language Program, a 3-part project that 1) incentivised the crowd-sourcing, collection and curation of language datasets through an online quantitative and qualitative challenge, 2) supported research fellows for a period of 3-4 months to create datasets annotated for NLP tasks, and 3) hosted competitive Machine Learning challenges on the basis of these datasets. Key outcomes of the work so far include 1) the creation of 9+ open source, African language datasets annotated for a variety of ML tasks, and 2) the creation of baseline models for these datasets through hosting of competitive ML challenges.


page 1

page 2

page 3

page 4


AI4D – African Language Dataset Challenge

As language and speech technologies become more advanced, the lack of fu...

CoVoST 2 and Massively Multilingual Speech-to-Text Translation

Speech translation has recently become an increasingly popular topic of ...

A Collaborative Ecosystem for Digital Coptic Studies

Scholarship on underresourced languages bring with them a variety of cha...

Language Technology Programme for Icelandic 2019-2023

In this paper, we describe a new national language technology programme ...

Investigating an approach for low resource language dataset creation, curation and classification: Setswana and Sepedi

The recent advances in Natural Language Processing have been a boon for ...

Google Crowdsourced Speech Corpora and Related Open-Source Resources for Low-Resource Languages and Dialects: An Overview

This paper presents an overview of a program designed to address the gro...

DAVE: Deriving Automatically Verilog from English

While specifications for digital systems are provided in natural languag...