Generative Adversarial Network based Speaker Adaptation for High Fidelity WaveNet Vocoder

12/06/2018
by   Qiao Tian, et al.
0

Neural networks based vocoders, typically the WaveNet, have achieved spectacular performance for text-to-speech (TTS) in recent years. Although state-of-the-art parallel WaveNet has addressed the issue of real-time waveform generation, there remains problems. Firstly, due to the noisy input signal of the model, there is still a gap between the quality of generated and natural waveforms. Secondly, a parallel WaveNet is trained under a distilled training framework, which makes it tedious to adapt a well trained model to a new speaker. To address these two problems, this paper proposes an end-to-end adaptation method based on the generative adversarial network (GAN), which can reduce the computational cost for the training of new speaker adaptation. Our subjective experiments shows that the proposed training method can further reduce the quality gap between generated and natural waveforms.

READ FULL TEXT
research
11/01/2021

RefineGAN: Universally Generating Waveform Better than Ground Truth with Highly Accurate Pitch and Intensity Responses

Most GAN(Generative Adversarial Network)-based approaches towards high-f...
research
04/26/2023

Source-Filter-Based Generative Adversarial Neural Vocoder for High Fidelity Speech Synthesis

This paper proposes a source-filter-based generative adversarial neural ...
research
04/06/2021

Deep learning for prediction of complex geology ahead of drilling

During a geosteering operation the well path is intentionally adjusted i...
research
07/31/2018

Wasserstein GAN and Waveform Loss-based Acoustic Model Training for Multi-speaker Text-to-Speech Synthesis Systems Using a WaveNet Vocoder

Recent neural networks such as WaveNet and sampleRNN that learn directly...
research
03/25/2019

Wav2Pix: Speech-conditioned Face Generation using Generative Adversarial Networks

Speech is a rich biometric signal that contains information about the id...
research
04/01/2018

I-vector Transformation Using Conditional Generative Adversarial Networks for Short Utterance Speaker Verification

I-vector based text-independent speaker verification (SV) systems often ...
research
05/15/2020

Unsupervised Cross-Domain Speech-to-Speech Conversion with Time-Frequency Consistency

In recent years generative adversarial network (GAN) based models have b...

Please sign up or login with your details

Forgot password? Click here to reset