Training Multi-Speaker Neural Text-to-Speech Systems using Speaker-Imbalanced Speech Corpora

04/01/2019
by   Hieu-Thi Luong, et al.
0

When the available data of a target speaker is insufficient to train a high quality speaker-dependent neural text-to-speech (TTS) system, we can combine data from multiple speakers and train a multi-speaker TTS model instead. Many studies have shown that neural multi-speaker TTS model trained with a small amount data from multiple speakers combined can generate synthetic speech with better quality and stability than a speaker-dependent one. However when the amount of data from each speaker is highly unbalanced, the best approach to make use of the excessive data remains unknown. Our experiments showed that simply combining all available data from every speaker to train a multi-speaker model produces better than or at least similar performance to its speaker-dependent counterpart. Moreover by using an ensemble multi-speaker model, in which each subsystem is trained on a subset of available data, we can further improve the quality of the synthetic speech especially for underrepresented speakers whose training data is limited.

READ FULL TEXT

page 3

page 4

research
11/14/2019

Speaker independence of neural vocoders and their effect on parametric resynthesis speech enhancement

Traditional speech enhancement systems produce speech with compromised q...
research
09/26/2022

Multi-Task Adversarial Training Algorithm for Multi-Speaker Neural Text-to-Speech

We propose a novel training algorithm for a multi-speaker neural text-to...
research
11/15/2018

Effect of data reduction on sequence-to-sequence neural TTS

Recent speech synthesis systems based on sampling from autoregressive ne...
research
12/07/2021

Multi-speaker Emotional Text-to-speech Synthesizer

We present a methodology to train our multi-speaker emotional text-to-sp...
research
02/07/2022

Building Synthetic Speaker Profiles in Text-to-Speech Systems

The diversity of speaker profiles in multi-speaker TTS systems is a cruc...
research
10/08/2021

A study on the efficacy of model pre-training in developing neural text-to-speech system

In the development of neural text-to-speech systems, model pre-training ...
research
02/28/2020

Comparison of Speech Representations for Automatic Quality Estimation in Multi-Speaker Text-to-Speech Synthesis

We aim to characterize how different speakers contribute to the perceive...

Please sign up or login with your details

Forgot password? Click here to reset