Training Text-To-Speech Systems From Synthetic Data: A Practical Approach For Accent Transfer Tasks

08/28/2022
by   Lev Finkelstein, et al.
0

Transfer tasks in text-to-speech (TTS) synthesis - where one or more aspects of the speech of one set of speakers is transferred to another set of speakers that do not feature these aspects originally - remains a challenging task. One of the challenges is that models that have high-quality transfer capabilities can have issues in stability, making them impractical for user-facing critical tasks. This paper demonstrates that transfer can be obtained by training a robust TTS system on data generated by a less robust TTS system designed for a high-quality transfer task; in particular, a CHiVE-BERT monolingual TTS system is trained on the output of a Tacotron model designed for accent transfer. While some quality loss is inevitable with this approach, experimental results show that the models trained on synthetic data this way can produce high quality audio displaying accent transfer, while preserving speaker characteristics such as speaking style.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/06/2021

Meta-StyleSpeech : Multi-Speaker Adaptive Text-to-Speech Generation

With rapid progress in neural text-to-speech (TTS) models, personalized ...
research
07/01/2020

LSTM and GPT-2 Synthetic Speech Transfer Learning for Speaker Recognition to Overcome Data Scarcity

In speech recognition problems, data scarcity often poses an issue due t...
research
11/23/2021

Guided-TTS:Text-to-Speech with Untranscribed Speech

Most neural text-to-speech (TTS) models require <speech, transcript> pai...
research
09/08/2021

Referee: Towards reference-free cross-speaker style transfer with low-quality data for expressive speech synthesis

Cross-speaker style transfer (CSST) in text-to-speech (TTS) synthesis ai...
research
06/17/2019

Towards Transfer Learning for End-to-End Speech Synthesis from Deep Pre-Trained Language Models

Modern text-to-speech (TTS) systems are able to generate audio that soun...
research
03/03/2023

Miipher: A Robust Speech Restoration Model Integrating Self-Supervised Speech and Text Representations

Speech restoration (SR) is a task of converting degraded speech signals ...
research
11/29/2022

Evaluating and reducing the distance between synthetic and real speech distributions

While modern Text-to-Speech (TTS) systems can produce speech rated highl...

Please sign up or login with your details

Forgot password? Click here to reset