CSS10: A Collection of Single Speaker Speech Datasets for 10 Languages

03/27/2019
by   Kyubyong Park, et al.
0

We describe our development of CSS10, a collection of single speaker speech datasets for ten languages. It is composed of short audio clips from LibriVox audiobooks and their aligned texts. To validate its quality we train two neural text-to-speech models on each dataset. Subsequently, we conduct Mean Opinion Score tests on the synthesized speech samples. We make our datasets, pre-trained models, and test resources publicly available. We hope they will be used for future speech tasks.

READ FULL TEXT

page 3

page 4

research
06/26/2019

RUSLAN: Russian Spoken Language Corpus for Speech Synthesis

We present RUSLAN -- a new open Russian spoken language corpus for the t...
research
11/27/2019

Jejueo Datasets for Machine Translation and Speech Synthesis

Jejueo was classified as critically endangered by UNESCO in 2010. Althou...
research
12/13/2020

SPARTA: Speaker Profiling for ARabic TAlk

This paper proposes a novel approach to an automatic estimation of three...
research
03/22/2023

AfroDigits: A Community-Driven Spoken Digit Dataset for African Languages

The advancement of speech technologies has been remarkable, yet its inte...
research
10/12/2022

Can we use Common Voice to train a Multi-Speaker TTS system?

Training of multi-speaker text-to-speech (TTS) systems relies on curated...
research
06/25/2022

Synthesizing Personalized Non-speech Vocalization from Discrete Speech Representations

We formulated non-speech vocalization (NSV) modeling as a text-to-speech...
research
04/13/2022

Predicting score distribution to improve non-intrusive speech quality estimation

Deep noise suppressors (DNS) have become an attractive solution to remov...

Please sign up or login with your details

Forgot password? Click here to reset