Zero-shot Singing Technique Conversion

11/16/2021
by   Brendan O'Connor, et al.
0

In this paper we propose modifications to the neural network framework, AutoVC for the task of singing technique conversion. This includes utilising a pretrained singing technique encoder which extracts technique information, upon which a decoder is conditioned during training. By swapping out a source singer's technique information for that of the target's during conversion, the input spectrogram is reconstructed with the target's technique. We document the beneficial effects of omitting the latent loss, the importance of sequential training, and our process for fine-tuning the bottleneck. We also conducted a listening study where participants rate the specificity of technique-converted voices as well as their naturalness. From this we are able to conclude how effective the technique conversions are and how different conditions affect them, while assessing the model's ability to reconstruct its input data.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/13/2021

NoiseVC: Towards High Quality Zero-Shot Voice Conversion

Voice conversion (VC) is a task that transforms voice from target audio ...
research
08/19/2023

Effects of Convolutional Autoencoder Bottleneck Width on StarGAN-based Singing Technique Conversion

Singing technique conversion (STC) refers to the task of converting from...
research
05/31/2021

StarGAN-ZSVC: Towards Zero-Shot Voice Conversion in Low-Resource Contexts

Voice conversion is the task of converting a spoken utterance from a sou...
research
12/03/2019

Singing Voice Conversion with Disentangled Representations of Singer and Vocal Technique Using Variational Autoencoders

We propose a flexible framework that deals with both singer conversion a...
research
02/27/2023

A Comparative Analysis Of Latent Regressor Losses For Singing Voice Conversion

Previous research has shown that established techniques for spoken voice...
research
04/13/2019

Unsupervised Singing Voice Conversion

We present a deep learning method for singing voice conversion. The prop...
research
09/22/2022

INFINITY: A Simple Yet Effective Unsupervised Framework for Graph-Text Mutual Conversion

Graph-to-text (G2T) generation and text-to-graph (T2G) triple extraction...

Please sign up or login with your details

Forgot password? Click here to reset