Improved parallel WaveGAN vocoder with perceptually weighted spectrogram loss

01/19/2021
by   Eunwoo Song, et al.
0

This paper proposes a spectral-domain perceptual weighting technique for Parallel WaveGAN-based text-to-speech (TTS) systems. The recently proposed Parallel WaveGAN vocoder successfully generates waveform sequences using a fast non-autoregressive WaveNet model. By employing multi-resolution short-time Fourier transform (MR-STFT) criteria with a generative adversarial network, the light-weight convolutional networks can be effectively trained without any distillation process. To further improve the vocoding performance, we propose the application of frequency-dependent weighting to the MR-STFT loss function. The proposed method penalizes perceptually-sensitive errors in the frequency domain; thus, the model is optimized toward reducing auditory noise in the synthesized speech. Subjective listening test results demonstrate that our proposed method achieves 4.21 and 4.26 TTS mean opinion scores for female and male Korean speakers, respectively.

READ FULL TEXT
research
10/25/2019

Parallel WaveGAN: A fast waveform generation model based on generative adversarial networks with multi-resolution spectrogram

We propose Parallel WaveGAN, a distillation-free, fast, and small-footpr...
research
10/27/2020

Parallel waveform synthesis based on generative adversarial networks with voicing-aware conditional discriminators

This paper proposes voicing-aware conditional discriminators for Paralle...
research
04/09/2019

Probability density distillation with generative adversarial networks for high-quality parallel waveform generation

This paper proposes an effective probability density distillation (PDD) ...
research
04/03/2018

Speech waveform synthesis from MFCC sequences with generative adversarial networks

This paper proposes a method for generating speech from filterbank mel f...
research
11/02/2021

Attention-Guided Generative Adversarial Network for Whisper to Normal Speech Conversion

Whispered speech is a special way of pronunciation without using vocal c...
research
07/25/2020

Quasi-Periodic Parallel WaveGAN: A Non-autoregressive Raw Waveform Generative Model with Pitch-dependent Dilated Convolution Neural Network

In this paper, we propose a quasi-periodic parallel WaveGAN (QPPWG) wave...
research
06/16/2021

Improving the expressiveness of neural vocoding with non-affine Normalizing Flows

This paper proposes a general enhancement to the Normalizing Flows (NF) ...

Please sign up or login with your details

Forgot password? Click here to reset