BASPRO: a balanced script producer for speech corpus collection based on the genetic algorithm

by   Yu-Wen Chen, et al.

The performance of speech-processing models is heavily influenced by the speech corpus that is used for training and evaluation. In this study, we propose BAlanced Script PROducer (BASPRO) system, which can automatically construct a phonetically balanced and rich set of Chinese sentences for collecting Mandarin Chinese speech data. First, we used pretrained natural language processing systems to extract ten-character candidate sentences from a large corpus of Chinese news texts. Then, we applied a genetic algorithm-based method to select 20 phonetically balanced sentence sets, each containing 20 sentences, from the candidate sentences. Using BASPRO, we obtained a recording script called TMNews, which contains 400 ten-character sentences. TMNews covers 84 distribution has 0.96 cosine similarity to the real-world syllable distribution. We converted the script into a speech corpus using two text-to-speech systems. Using the designed speech corpus, we tested the performances of speech enhancement (SE) and automatic speech recognition (ASR), which are one of the most important regression- and classification-based speech processing tasks, respectively. The experimental results show that the SE and ASR models trained on the designed speech corpus outperform their counterparts trained on a randomly composed speech corpus.


Reinforcement Learning Based Speech Enhancement for Robust Speech Recognition

Conventional deep neural network (DNN)-based speech enhancement (SE) app...

Structural Analysis of Hindi Phonetics and A Method for Extraction of Phonetically Rich Sentences from a Very Large Hindi Text Corpus

Automatic speech recognition (ASR) and Text to speech (TTS) are two prom...

A comparative study of Grid and Natural sentences effects on Normal-to-Lombard conversion

Grid sentence is commonly used for studying the Lombard effect and Norma...

A high quality and phonetic balanced speech corpus for Vietnamese

This paper presents a high quality Vietnamese speech corpus that can be ...

Diversity-based core-set selection for text-to-speech with linguistic and acoustic features

This paper proposes a method for extracting a lightweight subset from a ...

EMALG: An Enhanced Mandarin Lombard Grid Corpus with Meaningful Sentences

This study investigates the Lombard effect, where individuals adapt thei...

Please sign up or login with your details

Forgot password? Click here to reset