AdaVocoder: Adaptive Vocoder for Custom Voice

03/18/2022
by   Xin Yuan, et al.
0

Custom voice is to construct a personal speech synthesis system by adapting the source speech synthesis model to the target model through the target few recordings. The solution to constructing a custom voice is to combine an adaptive acoustic model with a robust vocoder. However, training a robust vocoder usually requires a multi-speaker dataset, which should include various age groups and various timbres, so that the trained vocoder can be used for unseen speakers. Collecting such a multi-speaker dataset is difficult, and the dataset distribution always has a mismatch with the distribution of the target speaker dataset. This paper proposes an adaptive vocoder for custom voice from another novel perspective to solve the above problems. The adaptive vocoder mainly uses a cross-domain consistency loss to solve the overfitting problem encountered by the GAN-based neural vocoder in the transfer learning of few-shot scenes. We construct two adaptive vocoders, AdaMelGAN and AdaHiFi-GAN. First, We pre-train the source vocoder model on AISHELL3 and CSMSC datasets, respectively. Then, fine-tune it on the internal dataset VXI-children with few adaptation data. The empirical results show that a high-quality custom voice system can be built by combining a adaptive acoustic model with a adaptive vocoder.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/01/2021

AdaSpeech: Adaptive Text to Speech for Custom Voice

Custom voice, a specific text to speech (TTS) service in commercial spee...
research
07/05/2022

Glow-WaveGAN 2: High-quality Zero-shot Text-to-speech Synthesis and Any-to-any Voice Conversion

The zero-shot scenario for speech generation aims at synthesizing a nove...
research
10/12/2021

Adapting TTS models For New Speakers using Transfer Learning

Training neural text-to-speech (TTS) models for a new speaker typically ...
research
04/20/2021

AdaSpeech 2: Adaptive Text to Speech with Untranscribed Data

Text to speech (TTS) is widely used to synthesize personal voice for a t...
research
04/07/2022

Self supervised learning for robust voice cloning

Voice cloning is a difficult task which requires robust and informative ...
research
03/21/2023

Personalized Lightweight Text-to-Speech: Voice Cloning with Adaptive Structured Pruning

Personalized TTS is an exciting and highly desired application that allo...
research
02/25/2018

Multi-channel Adaptive Dereverberation Tracing Abrupt Position Change of Target Speaker

Adaptive algorithm based on multi-channel linear prediction is an effect...

Please sign up or login with your details

Forgot password? Click here to reset