AI4D – African Language Dataset Challenge

07/23/2020
by   Kathleen Siminyu, et al.
0

As language and speech technologies become more advanced, the lack of fundamental digital resources for African languages, such as data, spell checkers and Part of Speech taggers, means that the digital divide between these languages and others keeps growing. This work details the organisation of the AI4D - African Language Dataset Challenge, an effort to incentivize the creation, organization and discovery of African language datasets through a competitive challenge. We particularly encouraged the submission of annotated datasets which can be used for training task-specific supervised machine learning models.

READ FULL TEXT

page 5

page 6

page 7

page 8

04/06/2021

AI4D – African Language Program

Advances in speech and language technologies enable tools such as voice-...
10/10/2017

A Very Low Resource Language Speech Corpus for Computational Language Documentation Experiments

Most speech and language technologies are trained with massive amounts o...
12/17/2021

Challenge Dataset of Cognates and False Friend Pairs from Indian Languages

Cognates are present in multiple variants of the same text across differ...
02/23/2018

The JHU Speech LOREHLT 2017 System: Cross-Language Transfer for Situation-Frame Detection

We describe the system our team used during NIST's LoReHLT (Low Resource...
04/17/2020

AlloVera: A Multilingual Allophone Database

We introduce a new resource, AlloVera, which provides mappings from 218 ...
03/17/2022

Dim Wihl Gat Tun: The Case for Linguistic Expertise in NLP for Underdocumented Languages

Recent progress in NLP is driven by pretrained models leveraging massive...
10/25/2020

AutoSpeech 2020: The Second Automated Machine Learning Challenge for Speech Classification

The AutoSpeech challenge calls for automated machine learning (AutoML) s...