Log In Sign Up

PPG-based singing voice conversion with adversarial representation learning

by   Zhonghao Li, et al.

Singing voice conversion (SVC) aims to convert the voice of one singer to that of other singers while keeping the singing content and melody. On top of recent voice conversion works, we propose a novel model to steadily convert songs while keeping their naturalness and intonation. We build an end-to-end architecture, taking phonetic posteriorgrams (PPGs) as inputs and generating mel spectrograms. Specifically, we implement two separate encoders: one encodes PPGs as content, and the other compresses mel spectrograms to supply acoustic and musical information. To improve the performance on timbre and melody, an adversarial singer confusion module and a mel-regressive representation learning module are designed for the model. Objective and subjective experiments are conducted on our private Chinese singing corpus. Comparing with the baselines, our methods can significantly improve the conversion performance in terms of naturalness, melody, and voice similarity. Moreover, our PPG-based method is proved to be robust for noisy sources.


page 1

page 2

page 3

page 4


The IQIYI System for Voice Conversion Challenge 2020

This paper presents the IQIYI voice conversion system (T24) for Voice Co...

HiFi-VC: High Quality ASR-Based Voice Conversion

The goal of voice conversion (VC) is to convert input voice to match the...

Towards end-to-end F0 voice conversion based on Dual-GAN with convolutional wavelet kernels

This paper presents a end-to-end framework for the F0 transformation in ...

Learning the Beauty in Songs: Neural Singing Voice Beautifier

We are interested in a novel task, singing voice beautifying (SVB). Give...

PitchNet: Unsupervised Singing Voice Conversion with Pitch Adversarial Network

Singing voice conversion is to convert a singer's voice to another one's...

DiffSVC: A Diffusion Probabilistic Model for Singing Voice Conversion

Singing voice conversion (SVC) is one promising technique which can enri...

Boosting Star-GANs for Voice Conversion with Contrastive Discriminator

Nonparallel multi-domain voice conversion methods such as the StarGAN-VC...