Toward Degradation-Robust Voice Conversion

10/14/2021
by   Chien-yu Huang, et al.
0

Any-to-any voice conversion technologies convert the vocal timbre of an utterance to any speaker even unseen during training. Although there have been several state-of-the-art any-to-any voice conversion models, they were all based on clean utterances to convert successfully. However, in real-world scenarios, it is difficult to collect clean utterances of a speaker, and they are usually degraded by noises or reverberations. It thus becomes highly desired to understand how these degradations affect voice conversion and build a degradation-robust model. We report in this paper the first comprehensive study on the degradation robustness of any-to-any voice conversion. We show that the performance of state-of-the-art models nowadays was severely hampered given degraded utterances. To this end, we then propose speech enhancement concatenation and denoising training to improve the robustness. In addition to common degradations, we also consider adversarial noises, which alter the model output significantly yet are human-imperceptible. It was shown that both concatenations with off-the-shelf speech enhancement models and denoising training on voice conversion models could improve the robustness, while each of them had pros and cons.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/18/2020

Defending Your Voice: Adversarial Attack on Voice Conversion

Substantial improvements have been achieved in recent years in voice con...
research
10/20/2022

Robust One-Shot Singing Voice Conversion

Many existing works on singing voice conversion (SVC) require clean reco...
research
11/24/2020

How Far Are We from Robust Voice Conversion: A Survey

Voice conversion technologies have been greatly improved in recent years...
research
12/22/2016

Robustness of Voice Conversion Techniques Under Mismatched Conditions

Most of the existing studies on voice conversion (VC) are conducted in a...
research
09/14/2019

Bootstrapping non-parallel voice conversion from speaker-adaptive text-to-speech

Voice conversion (VC) and text-to-speech (TTS) are two tasks that share ...
research
10/27/2020

FragmentVC: Any-to-Any Voice Conversion by End-to-End Extracting and Fusing Fine-Grained Voice Fragments With Attention

Any-to-any voice conversion aims to convert the voice from and to any sp...
research
06/30/2022

An Evaluation of Three-Stage Voice Conversion Framework for Noisy and Reverberant Conditions

This paper presents a new voice conversion (VC) framework capable of dea...

Please sign up or login with your details

Forgot password? Click here to reset