AI4D – African Language Program

04/06/2021
by   Kathleen Siminyu, et al.
210

Advances in speech and language technologies enable tools such as voice-search, text-to-speech, speech recognition and machine translation. These are however only available for high resource languages like English, French or Chinese. Without foundational digital resources for African languages, which are considered low-resource in the digital context, these advanced tools remain out of reach. This work details the AI4D - African Language Program, a 3-part project that 1) incentivised the crowd-sourcing, collection and curation of language datasets through an online quantitative and qualitative challenge, 2) supported research fellows for a period of 3-4 months to create datasets annotated for NLP tasks, and 3) hosted competitive Machine Learning challenges on the basis of these datasets. Key outcomes of the work so far include 1) the creation of 9+ open source, African language datasets annotated for a variety of ML tasks, and 2) the creation of baseline models for these datasets through hosting of competitive ML challenges.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/23/2020

AI4D – African Language Dataset Challenge

As language and speech technologies become more advanced, the lack of fu...
research
03/30/2022

Vakyansh: ASR Toolkit for Low Resource Indic languages

We present Vakyansh, an end to end toolkit for Speech Recognition in Ind...
research
12/11/2019

A Collaborative Ecosystem for Digital Coptic Studies

Scholarship on underresourced languages bring with them a variety of cha...
research
03/20/2020

Language Technology Programme for Icelandic 2019-2023

In this paper, we describe a new national language technology programme ...
research
02/18/2020

Investigating an approach for low resource language dataset creation, curation and classification: Setswana and Sepedi

The recent advances in Natural Language Processing have been a boon for ...
research
02/23/2018

The JHU Speech LOREHLT 2017 System: Cross-Language Transfer for Situation-Frame Detection

We describe the system our team used during NIST's LoReHLT (Low Resource...
research
04/21/2020

Learnings from Technological Interventions in a Low Resource Language: A Case-Study on Gondi

The primary obstacle to developing technologies for low-resource languag...

Please sign up or login with your details

Forgot password? Click here to reset