NoiseVC: Towards High Quality Zero-Shot Voice Conversion

04/13/2021
by   Shijun Wang, et al.
9

Voice conversion (VC) is a task that transforms voice from target audio to source without losing linguistic contents, it is challenging especially when source and target speakers are unseen during training (zero-shot VC). Previous approaches require a pre-trained model or linguistic data to do the zero-shot conversion. Meanwhile, VC models with Vector Quantization (VQ) or Instance Normalization (IN) are able to disentangle contents from audios and achieve successful conversions. However, disentanglement in these models highly relies on heavily constrained bottleneck layers, thus, the sound quality is drastically sacrificed. In this paper, we propose NoiseVC, an approach that can disentangle contents based on VQ and Contrastive Predictive Coding (CPC). Additionally, Noise Augmentation is performed to further enhance disentanglement capability. We conduct several experiments and demonstrate that NoiseVC has a strong disentanglement ability with a small sacrifice of quality.

READ FULL TEXT
research
05/31/2021

StarGAN-ZSVC: Towards Zero-Shot Voice Conversion in Low-Resource Contexts

Voice conversion is the task of converting a spoken utterance from a sou...
research
11/16/2021

Zero-shot Singing Technique Conversion

In this paper we propose modifications to the neural network framework, ...
research
05/26/2020

Adversarial Contrastive Predictive Coding for Unsupervised Learning of Disentangled Representations

In this work we tackle disentanglement of speaker and content related va...
research
07/05/2022

Glow-WaveGAN 2: High-quality Zero-shot Text-to-speech Synthesis and Any-to-any Voice Conversion

The zero-shot scenario for speech generation aims at synthesizing a nove...
research
09/18/2023

Face-Driven Zero-Shot Voice Conversion with Memory-based Face-Voice Alignment

This paper presents a novel task, zero-shot voice conversion based on fa...
research
10/27/2021

Zero-shot Voice Conversion via Self-supervised Prosody Representation Learning

Voice Conversion (VC) for unseen speakers, also known as zero-shot VC, i...
research
09/28/2021

Diffusion-Based Voice Conversion with Fast Maximum Likelihood Sampling Scheme

Voice conversion is a common speech synthesis task which can be solved i...

Please sign up or login with your details

Forgot password? Click here to reset