Change your singer: a transfer learning generative adversarial framework for song to song conversion

11/07/2019
by   Rema Daher, et al.
0

Have you ever wondered how a song might sound if performed by a different artist? In this work, we propose SCM-GAN, an end-to-end non-parallel song conversion system powered by generative adversarial and transfer learning that allows users to listen to a selected target singer singing any song. SCM-GAN first separates songs into vocals and instrumental music using a U-Net network, then converts the vocal segments to the target singer using advanced CycleGAN-VC, before merging the converted vocals with their corresponding background music. SCM-GAN is first initialized with feature representations learned from a state-of-the-art voice-to-voice conversion and then trained on a dataset of non-parallel songs. Furthermore, SCM-GAN is evaluated against a set of metrics including global variance GV and modulation spectra MS on the 24 Mel-cepstral coefficients (MCEPs). Transfer learning improves the GV by 35 the MS by 13 satisfaction with the quality and the naturalness of the conversion. Results show above par similarity between SCM-GAN's output and the target (70% on average) as well as great naturalness of the converted songs.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/22/2021

Investigating Deep Neural Structures and their Interpretability in the Domain of Voice Conversion

Generative Adversarial Networks (GANs) are machine learning networks bas...
research
08/10/2020

VAW-GAN for Singing Voice Conversion with Non-parallel Training Data

Singing voice conversion aims to convert singer's voice from source to t...
research
10/08/2019

MelGAN-VC: Voice Conversion and Audio Style Transfer on arbitrarily long samples using Spectrograms

Traditional voice conversion methods rely on parallel recordings of mult...
research
08/18/2020

CinC-GAN for Effective F0 prediction for Whisper-to-Normal Speech Conversion

Recently, Generative Adversarial Networks (GAN)-based methods have shown...
research
05/14/2019

Zero-Shot Voice Style Transfer with Only Autoencoder Loss

Non-parallel many-to-many voice conversion, as well as zero-shot voice c...
research
04/04/2017

Voice Conversion from Unaligned Corpora using Variational Autoencoding Wasserstein Generative Adversarial Networks

Building a voice conversion (VC) system from non-parallel speech corpora...
research
04/09/2019

CycleGAN-VC2: Improved CycleGAN-based Non-parallel Voice Conversion

Non-parallel voice conversion (VC) is a technique for learning the mappi...

Please sign up or login with your details

Forgot password? Click here to reset