GELP: GAN-Excited Linear Prediction for Speech Synthesis from Mel-spectrogram

04/08/2019
by   Lauri Juvela, et al.
0

Recent advances in neural network -based text-to-speech have reached human level naturalness in synthetic speech. The present sequence-to-sequence models can directly map text to mel-spectrogram acoustic features, which are convenient for modeling, but present additional challenges for vocoding (i.e., waveform generation from the acoustic features). High-quality synthesis can be achieved with neural vocoders, such as WaveNet, but such autoregressive models suffer from slow sequential inference. Meanwhile, their existing parallel inference counterparts are difficult to train and require increasingly large model sizes. In this paper, we propose an alternative training strategy for a parallel neural vocoder utilizing generative adversarial networks, and integrate a linear predictive synthesis filter into the model. Results show that the proposed model achieves significant improvement in inference speed, while outperforming a WaveNet in copy-synthesis quality.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/08/2019

GELP: GAN-Excited Liner Prediction for Speech Synthesis from Mel-spectrogram

Recent advances in neural network -based text-to-speech have reached hum...
research
10/30/2018

Waveform generation for text-to-speech synthesis using pitch-synchronous multi-scale generative adversarial networks

The state-of-the-art in text-to-speech synthesis has recently improved c...
research
04/07/2018

A comparison of recent waveform generation and acoustic modeling methods for neural-network-based speech synthesis

Recent advances in speech synthesis suggest that limitations such as the...
research
07/31/2018

Wasserstein GAN and Waveform Loss-based Acoustic Model Training for Multi-speaker Text-to-Speech Synthesis Systems Using a WaveNet Vocoder

Recent neural networks such as WaveNet and sampleRNN that learn directly...
research
12/17/2020

Parallel WaveNet conditioned on VAE latent vectors

Recently the state-of-the-art text-to-speech synthesis systems have shif...
research
05/20/2020

Investigation of learning abilities on linguistic features in sequence-to-sequence text-to-speech synthesis

Neural sequence-to-sequence text-to-speech synthesis (TTS) can produce h...
research
08/03/2020

A Spectral Energy Distance for Parallel Speech Synthesis

Speech synthesis is an important practical generative modeling problem t...

Please sign up or login with your details

Forgot password? Click here to reset