ExcitNet vocoder: A neural excitation model for parametric speech synthesis systems

11/09/2018
by   Eunwoo Song, et al.
0

This paper proposes a WaveNet-based neural excitation model (ExcitNet) for statistical parametric speech synthesis systems. Conventional WaveNet-based neural vocoding systems significantly improve the perceptual quality of synthesized speech by statistically generating a time sequence of speech waveforms through an auto-regressive framework. However, they often suffer from noisy outputs because of the difficulties in capturing the complicated time-varying nature of speech signals. To improve modeling efficiency, the proposed ExcitNet vocoder employs an adaptive inverse filter to decouple spectral components from the speech signal. The residual component, i.e. excitation signal, is then trained and generated within the WaveNet framework. In this way, the quality of the synthesized speech signal can be further improved since the spectral component is well represented by a deep learning framework and, moreover, the residual component is efficiently generated by the WaveNet framework. Experimental results show that the proposed ExcitNet vocoder, trained both speaker-dependently and speaker-independently, outperforms traditional linear prediction vocoders and similarly configured conventional WaveNet vocoders.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/08/2018

Speaker-adaptive neural vocoders for statistical parametric speech synthesis systems

This paper proposes speaker-adaptive neural vocoders for statistical par...
research
08/15/2022

Towards Parametric Speech Synthesis Using Gaussian-Markov Model of Spectral Envelope and Wavelet-Based Decomposition of F0

Neural network-based Text-to-Speech has significantly improved the quali...
research
11/29/2018

LP-WaveNet: Linear Prediction-based WaveNet Speech Synthesis

We propose a linear prediction (LP)-based waveform generation method via...
research
01/02/2020

Eigenresiduals for improved Parametric Speech Synthesis

Statistical parametric speech synthesizers have recently shown their abi...
research
03/31/2022

SpecGrad: Diffusion Probabilistic Model based Neural Vocoder with Adaptive Noise Spectral Shaping

Neural vocoder using denoising diffusion probabilistic model (DDPM) has ...
research
12/29/2019

The Deterministic plus Stochastic Model of the Residual Signal and its Applications

The modeling of speech production often relies on a source-filter approa...
research
07/28/2018

Analysing Shortcomings of Statistical Parametric Speech Synthesis

Output from statistical parametric speech synthesis (SPSS) remains notic...

Please sign up or login with your details

Forgot password? Click here to reset