Robust One-Shot Singing Voice Conversion

10/20/2022
by   Naoya Takahashi, et al.
0

Many existing works on singing voice conversion (SVC) require clean recordings of target singer's voice for training. However, it is often difficult to collect them in advance and singing voices are often distorted with reverb and accompaniment music. In this work, we propose robust one-shot SVC (ROSVC) that performs any-to-any SVC robustly even on such distorted singing voices using less than 10s of a reference voice. To this end, we propose two-stage training method called Robustify. In the first stage, a novel one-shot SVC model based on a generative adversarial network is trained on clean data to ensure high-quality conversion. In the second stage, enhancement modules are introduced to the encoders of the model to improve the robustness against distortions in the feature space. Experimental results show that the proposed method outperforms one-shot SVC baselines for both seen and unseen singers and greatly improves the robustness against the distortions.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/14/2021

Toward Degradation-Robust Voice Conversion

Any-to-any voice conversion technologies convert the vocal timbre of an ...
research
10/27/2022

FreeVC: Towards High-Quality Text-Free One-Shot Voice Conversion

Voice conversion (VC) can be achieved by first extracting source content...
research
02/15/2020

Many-to-Many Voice Conversion using Conditional Cycle-Consistent Adversarial Networks

Voice conversion (VC) refers to transforming the speaker characteristics...
research
07/18/2023

SLMGAN: Exploiting Speech Language Model Representations for Unsupervised Zero-Shot Voice Conversion in GANs

In recent years, large-scale pre-trained speech language models (SLMs) h...
research
09/21/2022

Boosting Star-GANs for Voice Conversion with Contrastive Discriminator

Nonparallel multi-domain voice conversion methods such as the StarGAN-VC...
research
09/28/2021

Diffusion-Based Voice Conversion with Fast Maximum Likelihood Sampling Scheme

Voice conversion is a common speech synthesis task which can be solved i...
research
09/15/2023

Controllable Residual Speaker Representation for Voice Conversion

Recently, there have been significant advancements in voice conversion, ...

Please sign up or login with your details

Forgot password? Click here to reset