Speech Resources in the Tamasheq Language

01/13/2022
by   Marcely Zanon Boito, et al.
9

In this paper we present two datasets for Tamasheq, a developing language mainly spoken in Mali and Niger. These two datasets were made available for the IWSLT 2022 low-resource speech translation track, and they consist of collections of radio recordings from the Studio Kalangou (Niger) and Studio Tamani (Mali) daily broadcast news. We share (i) a massive amount of unlabeled audio data (671 hours) in five languages: French from Niger, Fulfulde, Hausa, Tamasheq and Zarma, and (ii) a smaller parallel corpus of audio recordings (17 hours) in Tamasheq, with utterance-level translations in the French language. All this data is shared under the Creative Commons BY-NC-ND 3.0 license. We hope these resources will inspire the speech community to develop and benchmark models using the Tamasheq language.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/10/2017

A Very Low Resource Language Speech Corpus for Computational Language Documentation Experiments

Most speech and language technologies are trained with massive amounts o...
research
12/01/2020

NHSS: A Speech and Singing Parallel Database

We present a database of parallel recordings of speech and singing, coll...
research
11/30/2021

Challenges in Developing LRs for Non-Scheduled Languages: A Case of Magahi

Magahi is an Indo-Aryan Language, spoken mainly in the Eastern parts of ...
research
10/27/2022

Masked Autoencoders Are Articulatory Learners

Articulatory recordings track the positions and motion of different arti...
research
06/20/2023

HK-LegiCoST: Leveraging Non-Verbatim Transcripts for Speech Translation

We introduce HK-LegiCoST, a new three-way parallel corpus of Cantonese-E...
research
08/29/2019

Classifying topics in speech when all you have is crummy translations

Given a large amount of unannotated speech in a language with few resour...
research
04/08/2020

The Spotify Podcasts Dataset

Podcasts are a relatively new form of audio media. Episodes appear on a ...

Please sign up or login with your details

Forgot password? Click here to reset