Diversity-based core-set selection for text-to-speech with linguistic and acoustic features

09/15/2023
by   Kentaro Seki, et al.
0

This paper proposes a method for extracting a lightweight subset from a text-to-speech (TTS) corpus ensuring synthetic speech quality. In recent years, methods have been proposed for constructing large-scale TTS corpora by collecting diverse data from massive sources such as audiobooks and YouTube. Although these methods have gained significant attention for enhancing the expressive capabilities of TTS systems, they often prioritize collecting vast amounts of data without considering practical constraints like storage capacity and computation time in training, which limits the available data quantity. Consequently, the need arises to efficiently collect data within these volume constraints. To address this, we propose a method for selecting the core subset (known as core-set) from a TTS corpus on the basis of a diversity metric, which measures the degree to which a subset encompasses a wide range. Experimental results demonstrate that our proposed method performs significantly better than the baseline phoneme-balanced data selection across language and corpus size.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/26/2022

Text-to-speech synthesis from dark data with evaluation-in-the-loop data selection

This paper proposes a method for selecting training data for text-to-spe...
research
02/26/2023

Speech Corpora Divergence Based Unsupervised Data Selection for ASR

Selecting application scenarios matching data is important for the autom...
research
08/30/2018

Semi-Supervised Training for Improving Data Efficiency in End-to-End Speech Synthesis

Although end-to-end text-to-speech (TTS) models such as Tacotron have sh...
research
12/11/2022

BASPRO: a balanced script producer for speech corpus collection based on the genetic algorithm

The performance of speech-processing models is heavily influenced by the...
research
05/28/2023

Stochastic Pitch Prediction Improves the Diversity and Naturalness of Speech in Glow-TTS

Flow-based generative models are widely used in text-to-speech (TTS) sys...
research
10/21/2022

A Textless Metric for Speech-to-Speech Comparison

This paper proposes a textless speech-to-speech comparison metric that a...

Please sign up or login with your details

Forgot password? Click here to reset