AI4D – African Language Program

04/06/2021
by   Kathleen Siminyu, et al.
210

Advances in speech and language technologies enable tools such as voice-search, text-to-speech, speech recognition and machine translation. These are however only available for high resource languages like English, French or Chinese. Without foundational digital resources for African languages, which are considered low-resource in the digital context, these advanced tools remain out of reach. This work details the AI4D - African Language Program, a 3-part project that 1) incentivised the crowd-sourcing, collection and curation of language datasets through an online quantitative and qualitative challenge, 2) supported research fellows for a period of 3-4 months to create datasets annotated for NLP tasks, and 3) hosted competitive Machine Learning challenges on the basis of these datasets. Key outcomes of the work so far include 1) the creation of 9+ open source, African language datasets annotated for a variety of ML tasks, and 2) the creation of baseline models for these datasets through hosting of competitive ML challenges.

READ FULL TEXT

page 1

page 2

page 3

page 4

07/23/2020

AI4D – African Language Dataset Challenge

As language and speech technologies become more advanced, the lack of fu...
07/20/2020

CoVoST 2 and Massively Multilingual Speech-to-Text Translation

Speech translation has recently become an increasingly popular topic of ...
12/11/2019

A Collaborative Ecosystem for Digital Coptic Studies

Scholarship on underresourced languages bring with them a variety of cha...
03/20/2020

Language Technology Programme for Icelandic 2019-2023

In this paper, we describe a new national language technology programme ...
02/18/2020

Investigating an approach for low resource language dataset creation, curation and classification: Setswana and Sepedi

The recent advances in Natural Language Processing have been a boon for ...
10/14/2020

Google Crowdsourced Speech Corpora and Related Open-Source Resources for Low-Resource Languages and Dialects: An Overview

This paper presents an overview of a program designed to address the gro...
08/27/2020

DAVE: Deriving Automatically Verilog from English

While specifications for digital systems are provided in natural languag...