GAZEV: GAN-Based Zero-Shot Voice Conversion over Non-parallel Speech Corpus

10/24/2020
by   Zining Zhang, et al.
0

Non-parallel many-to-many voice conversion is recently attract-ing huge research efforts in the speech processing community. A voice conversion system transforms an utterance of a source speaker to another utterance of a target speaker by keeping the content in the original utterance and replacing by the vocal features from the target speaker. Existing solutions, e.g., StarGAN-VC2, present promising results, only when speech corpus of the engaged speakers is available during model training. AUTOVCis able to perform voice conversion on unseen speakers, but it needs an external pretrained speaker verification model. In this paper, we present our new GAN-based zero-shot voice conversion solution, called GAZEV, which targets to support unseen speakers on both source and target utterances. Our key technical contribution is the adoption of speaker embedding loss on top of the GAN framework, as well as adaptive instance normalization strategy, in order to address the limitations of speaker identity transfer in existing solutions. Our empirical evaluations demonstrate significant performance improvement on output speech quality and comparable speaker similarity to AUTOVC.

READ FULL TEXT
research
03/18/2022

DGC-vector: A new speaker embedding for zero-shot voice conversion

Recently, more and more zero-shot voice conversion algorithms have been ...
research
06/19/2021

Improving robustness of one-shot voice conversion with deep discriminative speaker encoder

One-shot voice conversion has received significant attention since only ...
research
03/30/2022

Enhancing Zero-Shot Many to Many Voice Conversion with Self-Attention VAE

Variational auto-encoder(VAE) is an effective neural network architectur...
research
11/23/2022

Voice-preserving Zero-shot Multiple Accent Conversion

Most people who have tried to learn a foreign language would have experi...
research
06/02/2021

NVC-Net: End-to-End Adversarial Voice Conversion

Voice conversion has gained increasing popularity in many applications o...
research
06/16/2021

Voicy: Zero-Shot Non-Parallel Voice Conversion in Noisy Reverberant Environments

Voice Conversion (VC) is a technique that aims to transform the non-ling...
research
09/23/2022

ControlVC: Zero-Shot Voice Conversion with Time-Varying Controls on Pitch and Rhythm

Recent developments in neural speech synthesis and vocoding have sparked...

Please sign up or login with your details

Forgot password? Click here to reset