Robust MelGAN: A robust universal neural vocoder for high-fidelity TTS

10/31/2022
by   Kun Song, et al.
0

In current two-stage neural text-to-speech (TTS) paradigm, it is ideal to have a universal neural vocoder, once trained, which is robust to imperfect mel-spectrogram predicted from the acoustic model. To this end, we propose Robust MelGAN vocoder by solving the original multi-band MelGAN's metallic sound problem and increasing its generalization ability. Specifically, we introduce a fine-grained network dropout strategy to the generator. With a specifically designed over-smooth handler which separates speech signal intro periodic and aperiodic components, we only perform network dropout to the aperodic components, which alleviates metallic sounding and maintains good speaker similarity. To further improve generalization ability, we introduce several data augmentation methods to augment fake data in the discriminator, including harmonic shift, harmonic noise and phase noise. Experiments show that Robust MelGAN can be used as a universal vocoder, significantly improving sound quality in TTS systems built on various types of data.

READ FULL TEXT

page 2

page 4

research
11/02/2022

DSPGAN: a GAN-based universal vocoder for high-fidelity TTS by time-frequency domain supervision from DSP

Recent development of neural vocoders based on the generative adversaria...
research
05/12/2022

Unified Source-Filter GAN with Harmonic-plus-Noise Source Excitation Generation

This paper introduces a unified source-filter network with a harmonic-pl...
research
07/10/2023

The NPU-MSXF Speech-to-Speech Translation System for IWSLT 2023 Speech-to-Speech Translation Task

This paper describes the NPU-MSXF system for the IWSLT 2023 speech-to-sp...
research
11/05/2022

VISinger 2: High-Fidelity End-to-End Singing Voice Synthesis Enhanced by Digital Signal Processing Synthesizer

End-to-end singing voice synthesis (SVS) model VISinger can achieve bett...
research
01/19/2022

MHTTS: Fast multi-head text-to-speech for spontaneous speech with imperfect transcription

Neural network based end-to-end Text-to-Speech (TTS) has greatly improve...
research
03/09/2021

Universal Undersampled MRI Reconstruction

Deep neural networks have been extensively studied for undersampled MRI ...

Please sign up or login with your details

Forgot password? Click here to reset