SingGAN: Generative Adversarial Network For High-Fidelity Singing Voice Generation

10/14/2021
by   Feiyang Chen, et al.
0

High-fidelity singing voice synthesis is challenging for neural vocoders due to extremely long continuous pronunciation, high sampling rate and strong expressiveness. Existing neural vocoders designed for text-to-speech cannot directly be applied to singing voice synthesis because they result in glitches in the generated spectrogram and poor high-frequency reconstruction. To tackle the difficulty of singing modeling, in this paper, we propose SingGAN, a singing voice vocoder with generative adversarial network. Specifically, 1) SingGAN uses source excitation to alleviate the glitch problem in the spectrogram; and 2) SingGAN adopts multi-band discriminators and introduces frequency-domain loss and sub-band feature matching loss to supervise high-frequency reconstruction. To our knowledge, SingGAN is the first vocoder designed towards high-fidelity multi-speaker singing voice synthesis. Experimental results show that SingGAN synthesizes singing voices with much higher quality (0.41 MOS gains) over the previous method. Further experiments show that combined with FastSpeech 2 as an acoustic model, SingGAN achieves high robustness in the singing voice synthesis pipeline and also performs well in speech synthesis.

READ FULL TEXT

page 2

page 3

page 6

research
12/20/2021

Multi-Singer: Fast Multi-Singer Singing Voice Vocoder With A Large-Scale Corpus

High-fidelity multi-singer singing voice synthesis is challenging for ne...
research
10/23/2022

HiFi-WaveGAN: Generative Adversarial Network with Auxiliary Spectrogram-Phase Loss for High-Fidelity Singing Voice Generation

Entertainment-oriented singing voice synthesis (SVS) requires a vocoder ...
research
10/26/2022

Xiaoicesing 2: A High-Fidelity Singing Voice Synthesizer Based on Generative Adversarial Network

XiaoiceSing is a singing voice synthesis (SVS) system that aims at gener...
research
09/03/2020

HiFiSinger: Towards High-Fidelity Neural Singing Voice Synthesis

High-fidelity singing voices usually require higher sampling rate (e.g.,...
research
09/21/2022

Mandarin Singing Voice Synthesis with Denoising Diffusion Probabilistic Wasserstein GAN

Singing voice synthesis (SVS) is the computer production of a human-like...
research
10/22/2020

Sequence-to-sequence Singing Voice Synthesis with Perceptual Entropy Loss

The neural network (NN) based singing voice synthesis (SVS) systems requ...
research
02/26/2022

Revisiting Over-Smoothness in Text to Speech

Non-autoregressive text to speech (NAR-TTS) models have attracted much a...

Please sign up or login with your details

Forgot password? Click here to reset