Speech waveform synthesis from MFCC sequences with generative adversarial networks

04/03/2018
by   Lauri Juvela, et al.
0

This paper proposes a method for generating speech from filterbank mel frequency cepstral coefficients (MFCC), which are widely used in speech applications, such as ASR, but are generally considered unusable for speech synthesis. First, we predict fundamental frequency and voicing information from MFCCs with an autoregressive recurrent neural net. Second, the spectral envelope information contained in MFCCs is converted to all-pole filters, and a pitch-synchronous excitation model matched to these filters is trained. Finally, we introduce a generative adversarial network -based noise model to add a realistic high-frequency stochastic component to the modeled excitation signal. The results show that high quality speech reconstruction can be obtained, given only MFCC information at test time.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/14/2019

Generative adversarial network-based glottal waveform model for statistical parametric speech synthesis

Recent studies have shown that text-to-speech synthesis quality can be i...
research
10/25/2018

Reducing over-smoothness in speech synthesis using Generative Adversarial Networks

Speech synthesis is widely used in many practical applications. In recen...
research
08/31/2018

Whispered-to-voiced Alaryngeal Speech Conversion with Generative Adversarial Networks

Most methods of voice restoration for patients suffering from aphonia ei...
research
11/02/2021

Attention-Guided Generative Adversarial Network for Whisper to Normal Speech Conversion

Whispered speech is a special way of pronunciation without using vocal c...
research
01/19/2021

Improved parallel WaveGAN vocoder with perceptually weighted spectrogram loss

This paper proposes a spectral-domain perceptual weighting technique for...
research
10/25/2019

Parallel WaveGAN: A fast waveform generation model based on generative adversarial networks with multi-resolution spectrogram

We propose Parallel WaveGAN, a distillation-free, fast, and small-footpr...
research
12/29/2019

The Deterministic plus Stochastic Model of the Residual Signal and its Applications

The modeling of speech production often relies on a source-filter approa...

Please sign up or login with your details

Forgot password? Click here to reset