Anyone GAN Sing

02/22/2021
by   Shreeviknesh Sankaran, et al.
0

The problem of audio synthesis has been increasingly solved using deep neural networks. With the introduction of Generative Adversarial Networks (GAN), another efficient and adjective path has opened up to solve this problem. In this paper, we present a method to synthesize the singing voice of a person using a Convolutional Long Short-term Memory (ConvLSTM) based GAN optimized using the Wasserstein loss function. Our work is inspired by WGANSing by Chandna et al. Our model inputs consecutive frame-wise linguistic and frequency features, along with singer identity and outputs vocoder features. We train the model on a dataset of 48 English songs sung and spoken by 12 non-professional singers. For inference, sequential blocks are concatenated using an overlap-add procedure. We test the model using the Mel-Cepstral Distance metric and a subjective listening test with 18 participants.

READ FULL TEXT
POST COMMENT

Comments

There are no comments yet.

Authors

page 3

page 4

03/26/2019

WGANSing: A Multi-Voice Singing Voice Synthesizer Based on the Wasserstein-GAN

We present a deep neural network based singing voice synthesizer, inspir...
05/02/2018

Text to Image Synthesis Using Generative Adversarial Networks

Generating images from natural language is one of the primary applicatio...
05/06/2017

Face Super-Resolution Through Wasserstein GANs

Generative adversarial networks (GANs) have received a tremendous amount...
09/13/2019

Spoken Speech Enhancement using EEG

In this paper we demonstrate spoken speech enhancement using electroence...
03/06/2019

Conditional GANs For Painting Generation

We examined the use of modern Generative Adversarial Nets to generate no...
01/09/2017

AdaGAN: Boosting Generative Models

Generative Adversarial Networks (GAN) (Goodfellow et al., 2014) are an e...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.