Neural source-filter-based waveform model for statistical parametric speech synthesis

10/29/2018
by   Xin Wang, et al.
0

Neural waveform models such as the WaveNet are used in many recent text-to-speech systems, but the original WaveNet is quite slow in waveform generation because of its autoregressive (AR) structure. Although faster non-AR models were recently reported, they may be prohibitively complicated due to the use of a distilling training method and the blend of other disparate training criteria. This study proposes a non-AR neural source-filter waveform model that can be directly trained using spectrum-based training criteria and the stochastic gradient descent method. Given the input acoustic features, the proposed model first uses a source module to generate a sine-based excitation signal and then uses a filter module to transform the excitation signal into the output speech waveform. Our experiments demonstrated that the proposed model generated waveforms at least 100 times faster than the AR WaveNet and the quality of its synthetic speech is close to that of speech generated by the AR WaveNet. Ablation test results showed that both the sine-wave excitation signal and the spectrum-based training criteria were essential to the performance of the proposed model.

READ FULL TEXT
research
04/27/2019

Neural source-filter waveform models for statistical parametric speech synthesis

Neural waveform models such as WaveNet have demonstrated better performa...
research
02/15/2021

PeriodNet: A non-autoregressive waveform generation model with a structure separating periodic and aperiodic components

We propose PeriodNet, a non-autoregressive (non-AR) waveform generation ...
research
04/07/2018

A comparison of recent waveform generation and acoustic modeling methods for neural-network-based speech synthesis

Recent advances in speech synthesis suggest that limitations such as the...
research
10/26/2020

TTS-by-TTS: TTS-driven Data Augmentation for Fast and High-Quality Speech Synthesis

In this paper, we propose a text-to-speech (TTS)-driven data augmentatio...
research
04/22/2021

Restoring degraded speech via a modified diffusion model

There are many deterministic mathematical operations (e.g. compression, ...
research
05/15/2020

Reverberation Modeling for Source-Filter-based Neural Vocoder

This paper presents a reverberation module for source-filter-based neura...
research
03/05/2022

NeuralDPS: Neural Deterministic Plus Stochastic Model with Multiband Excitation for Noise-Controllable Waveform Generation

The traditional vocoders have the advantages of high synthesis efficienc...

Please sign up or login with your details

Forgot password? Click here to reset