ÌròyìnSpeech: A multi-purpose Yorùbá Speech Corpus

07/29/2023
by   Tolulope Ogunremi, et al.
0

We introduce the ÌròyìnSpeech corpus – a new dataset influenced by a desire to increase the amount of high quality, freely available, contemporary Yorùbá speech. We release a multi-purpose dataset that can be used for both TTS and ASR tasks. We curated text sentences from the news and creative writing domains under an open license i.e., CC-BY-4.0 and had multiple speakers record each sentence. We provide 5000 of our utterances to the Common Voice platform to crowdsource transcriptions online. The dataset has 38.5 hours of data in total, recorded by 80 volunteers.

READ FULL TEXT

page 1

page 2

page 3

research
04/11/2019

A high quality and phonetic balanced speech corpus for Vietnamese

This paper presents a high quality Vietnamese speech corpus that can be ...
research
07/30/2021

USC: An Open-Source Uzbek Speech Corpus and Initial Speech Recognition Experiments

We present a freely available speech corpus for the Uzbek language and r...
research
12/01/2020

HORAE: an annotated dataset of books of hours

We introduce in this paper a new dataset of annotated pages from books o...
research
10/07/2021

WenetSpeech: A 10000+ Hours Multi-domain Mandarin Corpus for Speech Recognition

In this paper, we present WenetSpeech, a multi-domain Mandarin corpus co...
research
06/15/2021

RyanSpeech: A Corpus for Conversational Text-to-Speech Synthesis

This paper introduces RyanSpeech, a new speech corpus for research on au...
research
05/30/2023

LibriTTS-R: A Restored Multi-Speaker Text-to-Speech Corpus

This paper introduces a new speech dataset called “LibriTTS-R” designed ...
research
04/07/2022

Arabic Text-To-Speech (TTS) Data Preparation

People may be puzzled by the fact that voice over recordings data sets e...

Please sign up or login with your details

Forgot password? Click here to reset