Voicy: Zero-Shot Non-Parallel Voice Conversion in Noisy Reverberant Environments

06/16/2021
by   Alejandro Mottini, et al.
0

Voice Conversion (VC) is a technique that aims to transform the non-linguistic information of a source utterance to change the perceived identity of the speaker. While there is a rich literature on VC, most proposed methods are trained and evaluated on clean speech recordings. However, many acoustic environments are noisy and reverberant, severely restricting the applicability of popular VC methods to such scenarios. To address this limitation, we propose Voicy, a new VC framework particularly tailored for noisy speech. Our method, which is inspired by the de-noising auto-encoders framework, is comprised of four encoders (speaker, content, phonetic and acoustic-ASR) and one decoder. Importantly, Voicy is capable of performing non-parallel zero-shot VC, an important requirement for any VC system that needs to work on speakers not seen during training. We have validated our approach using a noisy reverberant version of the LibriSpeech dataset. Experimental results show that Voicy outperforms other tested VC techniques in terms of naturalness and target speaker similarity in noisy reverberant environments.

READ FULL TEXT
research
10/24/2020

GAZEV: GAN-Based Zero-Shot Voice Conversion over Non-parallel Speech Corpus

Non-parallel many-to-many voice conversion is recently attract-ing huge ...
research
12/04/2021

YourTTS: Towards Zero-Shot Multi-Speaker TTS and Zero-Shot Voice Conversion for everyone

YourTTS brings the power of a multilingual approach to the task of zero-...
research
11/06/2021

SIG-VC: A Speaker Information Guided Zero-shot Voice Conversion System for Both Human Beings and Machines

Nowadays, as more and more systems achieve good performance in tradition...
research
03/30/2022

Enhancing Zero-Shot Many to Many Voice Conversion with Self-Attention VAE

Variational auto-encoder(VAE) is an effective neural network architectur...
research
09/23/2022

ControlVC: Zero-Shot Voice Conversion with Time-Varying Controls on Pitch and Rhythm

Recent developments in neural speech synthesis and vocoding have sparked...
research
12/08/2021

Training Robust Zero-Shot Voice Conversion Models with Self-supervised Features

Unsupervised Zero-Shot Voice Conversion (VC) aims to modify the speaker ...
research
11/13/2021

Direct Noisy Speech Modeling for Noisy-to-Noisy Voice Conversion

Beyond the conventional voice conversion (VC) where the speaker informat...

Please sign up or login with your details

Forgot password? Click here to reset