Waveform generation for text-to-speech synthesis using pitch-synchronous multi-scale generative adversarial networks

10/30/2018
by   Lauri Juvela, et al.
0

The state-of-the-art in text-to-speech synthesis has recently improved considerably due to novel neural waveform generation methods, such as WaveNet. However, these methods suffer from their slow sequential inference process, while their parallel versions are difficult to train and even more expensive computationally. Meanwhile, generative adversarial networks (GANs) have achieved impressive results in image generation and are making their way into audio applications; parallel inference is among their lucrative properties. By adopting recent advances in GAN training techniques, this investigation studies waveform generation for TTS in two domains (speech signal and glottal excitation). Listening test results show that while direct waveform generation with GAN is still far behind WaveNet, a GAN-based glottal excitation model can achieve quality and voice similarity on par with a WaveNet vocoder.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/08/2019

GELP: GAN-Excited Liner Prediction for Speech Synthesis from Mel-spectrogram

Recent advances in neural network -based text-to-speech have reached hum...
research
11/13/2022

Autovocoder: Fast Waveform Generation from a Learned Speech Representation using Differentiable Digital Signal Processing

Most state-of-the-art Text-to-Speech systems use the mel-spectrogram as ...
research
04/12/2022

A Post Auto-regressive GAN Vocoder Focused on Spectrum Fracture

Generative adversarial networks (GANs) have been indicated their superio...
research
04/28/2020

Conditional Spoken Digit Generation with StyleGAN

This paper adapts a StyleGAN model for speech generation with minimal or...
research
04/08/2019

GELP: GAN-Excited Linear Prediction for Speech Synthesis from Mel-spectrogram

Recent advances in neural network -based text-to-speech have reached hum...
research
04/09/2019

Probability density distillation with generative adversarial networks for high-quality parallel waveform generation

This paper proposes an effective probability density distillation (PDD) ...
research
08/03/2020

A Spectral Energy Distance for Parallel Speech Synthesis

Speech synthesis is an important practical generative modeling problem t...

Please sign up or login with your details

Forgot password? Click here to reset