Singing Voice Conversion with Disentangled Representations of Singer and Vocal Technique Using Variational Autoencoders

12/03/2019
by   Yin-Jyun Luo, et al.
0

We propose a flexible framework that deals with both singer conversion and singers vocal technique conversion. The proposed model is trained on non-parallel corpora, accommodates many-to-many conversion, and leverages recent advances of variational autoencoders. It employs separate encoders to learn disentangled latent representations of singer identity and vocal technique separately, with a joint decoder for reconstruction. Conversion is carried out by simple vector arithmetic in the learned latent spaces. Both a quantitative analysis as well as a visualization of the converted spectrograms show that our model is able to disentangle singer identity and vocal technique and successfully perform conversion of these attributes. To the best of our knowledge, this is the first work to jointly tackle conversion of singer identity and vocal technique based on a deep learning approach.

READ FULL TEXT
research
04/15/2020

F0-consistent many-to-many non-parallel voice conversion via conditional autoencoder

Non-parallel many-to-many voice conversion remains an interesting but ch...
research
01/21/2019

Spatial Broadcast Decoder: A Simple Architecture for Learning Disentangled Representations in VAEs

We present a simple neural rendering architecture that helps variational...
research
07/26/2021

Beyond Voice Identity Conversion: Manipulating Voice Attributes by Adversarial Learning of Structured Disentangled Representations

Voice conversion (VC) consists of digitally altering the voice of an ind...
research
05/17/2022

How do Variational Autoencoders Learn? Insights from Representational Similarity

The ability of Variational Autoencoders (VAEs) to learn disentangled rep...
research
11/16/2021

Zero-shot Singing Technique Conversion

In this paper we propose modifications to the neural network framework, ...
research
08/19/2023

Effects of Convolutional Autoencoder Bottleneck Width on StarGAN-based Singing Technique Conversion

Singing technique conversion (STC) refers to the task of converting from...
research
06/19/2019

Learning Disentangled Representations of Timbre and Pitch for Musical Instrument Sounds Using Gaussian Mixture Variational Autoencoders

In this paper, we learn disentangled representations of timbre and pitch...

Please sign up or login with your details

Forgot password? Click here to reset