A comparison of recent waveform generation and acoustic modeling methods for neural-network-based speech synthesis

04/07/2018
by   Xin Wang, et al.
0

Recent advances in speech synthesis suggest that limitations such as the lossy nature of the amplitude spectrum with minimum phase approximation and the over-smoothing effect in acoustic modeling can be overcome by using advanced machine learning approaches. In this paper, we build a framework in which we can fairly compare new vocoding and acoustic modeling techniques with conventional approaches by means of a large scale crowdsourced evaluation. Results on acoustic models showed that generative adversarial networks and an autoregressive (AR) model performed better than a normal recurrent network and the AR model performed best. Evaluation on vocoders by using the same AR acoustic model demonstrated that a Wavenet vocoder outperformed classical source-filter-based vocoders. Particularly, generated speech waveforms from the combination of AR acoustic model and Wavenet vocoder achieved a similar score of speech quality to vocoded speech.

READ FULL TEXT
research
10/29/2018

Neural source-filter-based waveform model for statistical parametric speech synthesis

Neural waveform models such as the WaveNet are used in many recent text-...
research
09/25/2017

Predicting interviewee attitude and body language from speech descriptors

This present research investigated the relationship between personal imp...
research
04/27/2019

Neural source-filter waveform models for statistical parametric speech synthesis

Neural waveform models such as WaveNet have demonstrated better performa...
research
04/08/2019

GELP: GAN-Excited Linear Prediction for Speech Synthesis from Mel-spectrogram

Recent advances in neural network -based text-to-speech have reached hum...
research
10/26/2020

TTS-by-TTS: TTS-driven Data Augmentation for Fast and High-Quality Speech Synthesis

In this paper, we propose a text-to-speech (TTS)-driven data augmentatio...
research
04/12/2022

A Post Auto-regressive GAN Vocoder Focused on Spectrum Fracture

Generative adversarial networks (GANs) have been indicated their superio...
research
05/05/2023

Physics-Based Acoustic Holograms

Advances in additive manufacturing have enabled the realisation of inexp...

Please sign up or login with your details

Forgot password? Click here to reset