A Post Auto-regressive GAN Vocoder Focused on Spectrum Fracture

04/12/2022
by   Zhenxing Lu, et al.
0

Generative adversarial networks (GANs) have been indicated their superiority in usage of the real-time speech synthesis. Nevertheless, most of them make use of deep convolutional layers as their backbone, which may cause the absence of previous signal information. However, the generation of speech signals invariably require preceding waveform samples in its reconstruction, as the lack of this can lead to artifacts in generated speech. To address this conflict, in this paper, we propose an improved model: a post auto-regressive (AR) GAN vocoder with a self-attention layer, which merging self-attention in an AR loop. It will not participate in inference, but can assist the generator to learn temporal dependencies within frames in training. Furthermore, an ablation study was done to confirm the contribution of each part. Systematic experiments show that our model leads to a consistent improvement on both objective and subjective evaluation performance.

READ FULL TEXT
research
10/30/2018

Waveform generation for text-to-speech synthesis using pitch-synchronous multi-scale generative adversarial networks

The state-of-the-art in text-to-speech synthesis has recently improved c...
research
10/03/2019

Bootstrapping Conditional GANs for Video Game Level Generation

Generative Adversarial Networks (GANs) have shown im-pressive results fo...
research
05/21/2018

Self-Attention Generative Adversarial Networks

In this paper, we propose the Self-Attention Generative Adversarial Netw...
research
04/07/2018

A comparison of recent waveform generation and acoustic modeling methods for neural-network-based speech synthesis

Recent advances in speech synthesis suggest that limitations such as the...
research
02/15/2022

Speech Denoising in the Waveform Domain with Self-Attention

In this work, we present CleanUNet, a causal speech denoising model on t...
research
04/19/2021

NISQA: A Deep CNN-Self-Attention Model for Multidimensional Speech Quality Prediction with Crowdsourced Datasets

In this paper, we present an update to the NISQA speech quality predicti...
research
10/15/2019

Neural Approximation of an Auto-Regressive Process through Confidence Guided Sampling

We propose a generic confidence-based approximation that can be plugged ...

Please sign up or login with your details

Forgot password? Click here to reset