A Comparative Analysis Of Latent Regressor Losses For Singing Voice Conversion

02/27/2023
by   Brendan O'Connor, et al.
0

Previous research has shown that established techniques for spoken voice conversion (VC) do not perform as well when applied to singing voice conversion (SVC). We propose an alternative loss component in a loss function that is otherwise well-established among VC tasks, which has been shown to improve our model's SVC performance. We first trained a singer identity embedding (SIE) network on mel-spectrograms of singer recordings to produce singer-specific variance encodings using contrastive learning. We subsequently trained a well-known autoencoder framework (AutoVC) conditioned on these SIEs, and measured differences in SVC performance when using different latent regressor loss components. We found that using this loss w.r.t. SIEs leads to better performance than w.r.t. bottleneck embeddings, where converted audio is more natural and specific towards target singers. The inclusion of this loss component has the advantage of explicitly forcing the network to reconstruct with timbral similarity, and also negates the effect of poor disentanglement in AutoVC's bottleneck embeddings. We demonstrate peculiar diversity between computational and human evaluations on singer-converted audio clips, which highlights the necessity of both. We also propose a pitch-matching mechanism between source and target singers to ensure these evaluations are not influenced by differences in pitch register.

READ FULL TEXT

page 3

page 5

research
08/19/2023

Effects of Convolutional Autoencoder Bottleneck Width on StarGAN-based Singing Technique Conversion

Singing technique conversion (STC) refers to the task of converting from...
research
04/13/2019

Unsupervised Singing Voice Conversion

We present a deep learning method for singing voice conversion. The prop...
research
06/03/2019

Blow: a single-scale hyperconditioned flow for non-parallel raw-audio voice conversion

End-to-end models for raw audio generation are a challenge, specially if...
research
01/26/2022

Invertible Voice Conversion

In this paper, we propose an invertible deep learning framework called I...
research
11/24/2020

How Far Are We from Robust Voice Conversion: A Survey

Voice conversion technologies have been greatly improved in recent years...
research
11/16/2021

Zero-shot Singing Technique Conversion

In this paper we propose modifications to the neural network framework, ...
research
03/12/2021

Seeking the Shape of Sound: An Adaptive Framework for Learning Voice-Face Association

Nowadays, we have witnessed the early progress on learning the associati...

Please sign up or login with your details

Forgot password? Click here to reset