TDASS: Target Domain Adaptation Speech Synthesis Framework for Multi-speaker Low-Resource TTS

05/24/2022
by   Xulong Zhang, et al.
0

Recently, synthesizing personalized speech by text-to-speech (TTS) application is highly demanded. But the previous TTS models require a mass of target speaker speeches for training. It is a high-cost task, and hard to record lots of utterances from the target speaker. Data augmentation of the speeches is a solution but leads to the low-quality synthesis speech problem. Some multi-speaker TTS models are proposed to address the issue. But the quantity of utterances of each speaker imbalance leads to the voice similarity problem. We propose the Target Domain Adaptation Speech Synthesis Network (TDASS) to address these issues. Based on the backbone of the Tacotron2 model, which is the high-quality TTS model, TDASS introduces a self-interested classifier for reducing the non-target influence. Besides, a special gradient reversal layer with different operations for target and non-target is added to the classifier. We evaluate the model on a Chinese speech corpus, the experiments show the proposed method outperforms the baseline method in terms of voice quality and voice similarity.

READ FULL TEXT
research
04/21/2022

Cross-Speaker Emotion Transfer for Low-Resource Text-to-Speech Using Non-Parallel Voice Conversion with Pitch-Shift Data Augmentation

Data augmentation via voice conversion (VC) has been successfully applie...
research
03/05/2018

Linear networks based speaker adaptation for speech synthesis

Speaker adaptation methods aim to create fair quality synthesis speech v...
research
06/12/2020

Neural voice cloning with a few low-quality samples

In this paper, we explore the possibility of speech synthesis from low q...
research
02/16/2022

Voice Filter: Few-shot text-to-speech speaker adaptation using voice conversion as a post-processing module

State-of-the-art text-to-speech (TTS) systems require several hours of r...
research
04/07/2022

Arabic Text-To-Speech (TTS) Data Preparation

People may be puzzled by the fact that voice over recordings data sets e...
research
08/02/2021

Speaker Adaptation with Continuous Vocoder-based DNN-TTS

Traditional vocoder-based statistical parametric speech synthesis can be...
research
05/29/2023

ADAPTERMIX: Exploring the Efficacy of Mixture of Adapters for Low-Resource TTS Adaptation

There are significant challenges for speaker adaptation in text-to-speec...

Please sign up or login with your details

Forgot password? Click here to reset