Parallel WaveNet conditioned on VAE latent vectors

12/17/2020
by   Jonas Rohnke, et al.
0

Recently the state-of-the-art text-to-speech synthesis systems have shifted to a two-model approach: a sequence-to-sequence model to predict a representation of speech (typically mel-spectrograms), followed by a 'neural vocoder' model which produces the time-domain speech waveform from this intermediate speech representation. This approach is capable of synthesizing speech that is confusable with natural speech recordings. However, the inference speed of neural vocoder approaches represents a major obstacle for deploying this technology for commercial applications. Parallel WaveNet is one approach which has been developed to address this issue, trading off some synthesis quality for significantly faster inference speed. In this paper we investigate the use of a sentence-level conditioning vector to improve the signal quality of a Parallel WaveNet neural vocoder. We condition the neural vocoder with the latent vector from a pre-trained VAE component of a Tacotron 2-style sequence-to-sequence model. With this, we are able to significantly improve the quality of vocoded speech.

READ FULL TEXT
research
11/06/2020

Wave-Tacotron: Spectrogram-free end-to-end text-to-speech synthesis

We describe a sequence-to-sequence neural network which can directly gen...
research
04/08/2019

GELP: GAN-Excited Liner Prediction for Speech Synthesis from Mel-spectrogram

Recent advances in neural network -based text-to-speech have reached hum...
research
04/08/2019

GELP: GAN-Excited Linear Prediction for Speech Synthesis from Mel-spectrogram

Recent advances in neural network -based text-to-speech have reached hum...
research
06/11/2020

FastPitch: Parallel Text-to-speech with Pitch Prediction

We present FastPitch, a fully-parallel text-to-speech model based on Fas...
research
09/14/2020

Controllable neural text-to-speech synthesis using intuitive prosodic features

Modern neural text-to-speech (TTS) synthesis can generate speech that is...
research
09/23/2019

Sequence to Sequence Neural Speech Synthesis with Prosody Modification Capabilities

Modern sequence to sequence neural TTS systems provide close to natural ...
research
06/29/2020

Prosodic Prominence and Boundaries in Sequence-to-Sequence Speech Synthesis

Recent advances in deep learning methods have elevated synthetic speech ...

Please sign up or login with your details

Forgot password? Click here to reset