BibleTTS: a large, high-fidelity, multilingual, and uniquely African speech corpus

07/07/2022
by   Josh Meyer, et al.
2

BibleTTS is a large, high-quality, open speech dataset for ten languages spoken in Sub-Saharan Africa. The corpus contains up to 86 hours of aligned, studio quality 48kHz single speaker recordings per language, enabling the development of high-quality text-to-speech models. The ten languages represented are: Akuapem Twi, Asante Twi, Chichewa, Ewe, Hausa, Kikuyu, Lingala, Luganda, Luo, and Yoruba. This corpus is a derivative work of Bible recordings made and released by the Open.Bible project from Biblica. We have aligned, cleaned, and filtered the original recordings, and additionally hand-checked a subset of the alignments for each language. We present results for text-to-speech models with Coqui TTS. The data is released under a commercial-friendly CC-BY-SA license.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/26/2019

RUSLAN: Russian Spoken Language Corpus for Speech Synthesis

We present RUSLAN -- a new open Russian spoken language corpus for the t...
research
06/01/2017

Polish Read Speech Corpus for Speech Tools and Services

This paper describes the speech processing activities conducted at the P...
research
07/30/2019

MaSS: A Large and Clean Multilingual Corpus of Sentence-aligned Spoken Utterances Extracted from the Bible

The CMU Wilderness Multilingual Speech Dataset is a newly published mult...
research
11/08/2022

SpeechMatrix: A Large-Scale Mined Corpus of Multilingual Speech-to-Speech Translations

We present SpeechMatrix, a large-scale multilingual corpus of speech-to-...
research
07/01/2022

Building African Voices

Modern speech synthesis techniques can produce natural-sounding speech g...
research
10/29/2021

VRAIN-UPV MLLP's system for the Blizzard Challenge 2021

This paper presents the VRAIN-UPV MLLP's speech synthesis system for the...
research
07/13/2023

Controllable Emphasis with zero data for text-to-speech

We present a scalable method to produce high quality emphasis for text-t...

Please sign up or login with your details

Forgot password? Click here to reset