Effect of data reduction on sequence-to-sequence neural TTS

11/15/2018
by   Javier Latorre, et al.
0

Recent speech synthesis systems based on sampling from autoregressive neural networks models can generate speech almost undistinguishable from human recordings. However, these models require large amounts of data. This paper shows that the lack of data from one speaker can be compensated with data from other speakers. The naturalness of Tacotron2-like models trained on a blend of 5k utterances from 7 speakers is better than that of speaker dependent models trained on 15k utterances, but in terms of stability multi-speaker models are always more stable. We also demonstrate that models mixing only 1250 utterances from a target speaker with 5k utterances from another 6 speakers can produce significantly better quality than state-of-the-art DNN-guided unit selection systems trained on more than 10 times the data from the target speaker.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/01/2019

Training Multi-Speaker Neural Text-to-Speech Systems using Speaker-Imbalanced Speech Corpora

When the available data of a target speaker is insufficient to train a h...
research
10/11/2021

Calibrate your listeners! Robust communication-based training for pragmatic speakers

To be good conversational partners, natural language processing (NLP) sy...
research
08/15/2023

Anaphoric Structure Emerges Between Neural Networks

Pragmatics is core to natural language, enabling speakers to communicate...
research
07/24/2018

Speakers account for asymmetries in visual perspective so listeners don't have to

Debates over adults' theory of mind use have been fueled by surprising f...
research
04/02/2016

Reasoning About Pragmatics with Neural Listeners and Speakers

We present a model for pragmatically describing scenes, in which contras...
research
08/02/2021

Speaker Adaptation with Continuous Vocoder-based DNN-TTS

Traditional vocoder-based statistical parametric speech synthesis can be...
research
10/10/2021

Personalizing ASR with limited data using targeted subset selection

We study the task of personalizing ASR models to a target non-native spe...

Please sign up or login with your details

Forgot password? Click here to reset