Neural source-filter waveform models for statistical parametric speech synthesis

04/27/2019
by   Xin Wang, et al.
0

Neural waveform models such as WaveNet have demonstrated better performance than conventional vocoders for statistical parametric speech synthesis. As an autoregressive (AR) model, WaveNet is limited by a slow sequential waveform generation process. Some new models that use the inverse-autoregressive flow (IAF) can generate a whole waveform in a one-shot manner. However, these IAF-based models require sequential transformation during training, which severely slows down the training speed. Other models such as Parallel WaveNet and ClariNet bring together the benefits of AR and IAF-based models and train an IAF model by transferring the knowledge from a pre-trained AR teacher to an IAF student without any sequential transformation. However, both models require additional training criteria, and their implementation is prohibitively complicated. We propose a framework for neural source-filter (NSF) waveform modeling without AR nor IAF-based approaches. This framework requires only three components for waveform generation: a source module that generates a sine-based signal as excitation, a non-AR dilated-convolution-based filter module that transforms the excitation into a waveform, and a conditional module that pre-processes the acoustic features for the source and filer modules. This framework minimizes spectral-amplitude distances for model training, which can be efficiently implemented by using short-time Fourier transform routines. Under this framework, we designed three NSF models and compared them with WaveNet. It was demonstrated that the NSF models generated waveforms at least 100 times faster than WaveNet, and the quality of the synthetic speech from the best NSF model was better than or equally good as that from WaveNet.

READ FULL TEXT

page 1

page 9

page 10

research
10/29/2018

Neural source-filter-based waveform model for statistical parametric speech synthesis

Neural waveform models such as the WaveNet are used in many recent text-...
research
04/07/2018

A comparison of recent waveform generation and acoustic modeling methods for neural-network-based speech synthesis

Recent advances in speech synthesis suggest that limitations such as the...
research
04/12/2017

A Neural Parametric Singing Synthesizer

We present a new model for singing synthesis based on a modified version...
research
05/15/2020

Reverberation Modeling for Source-Filter-based Neural Vocoder

This paper presents a reverberation module for source-filter-based neura...
research
05/28/2021

Differentiable Artificial Reverberation

We propose differentiable artificial reverberation (DAR), a family of ar...
research
05/18/2020

Quasi-Periodic Parallel WaveGAN Vocoder: A Non-autoregressive Pitch-dependent Dilated Convolution Model for Parametric Speech Generation

In this paper, we propose a parallel WaveGAN (PWG)-like neural vocoder w...
research
11/29/2018

LP-WaveNet: Linear Prediction-based WaveNet Speech Synthesis

We propose a linear prediction (LP)-based waveform generation method via...

Please sign up or login with your details

Forgot password? Click here to reset