Utterance-level Sequential Modeling For Deep Gaussian Process Based Speech Synthesis Using Simple Recurrent Unit

04/22/2020
by   Tomoki Koriyama, et al.
0

This paper presents a deep Gaussian process (DGP) model with a recurrent architecture for speech sequence modeling. DGP is a Bayesian deep model that can be trained effectively with the consideration of model complexity and is a kernel regression model that can have high expressibility. In the previous studies, it was shown that the DGP-based speech synthesis outperformed neural network-based one, in which both models used a feed-forward architecture. To improve the naturalness of synthetic speech, in this paper, we show that DGP can be applied to utterance-level modeling using recurrent architecture models. We adopt a simple recurrent unit (SRU) for the proposed model to achieve a recurrent architecture, in which we can execute fast speech parameter generation by using the high parallelization nature of SRU. The objective and subjective evaluation results show that the proposed SRU-DGP-based speech synthesis outperforms not only feed-forward DGP but also automatically tuned SRU- and long short-term memory (LSTM)-based neural networks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/26/2018

Deep Feed-forward Sequential Memory Networks for Speech Synthesis

The Bidirectional LSTM (BLSTM) RNN based speech synthesis system is amon...
research
01/11/2016

Investigating gated recurrent neural networks for speech synthesis

Recently, recurrent neural networks (RNNs) as powerful sequence models h...
research
01/24/2018

Waveform Modeling and Generation Using Hierarchical Recurrent Neural Networks for Speech Bandwidth Extension

This paper presents a waveform modeling and generation method using hier...
research
06/19/2021

Advances in Speech Vocoding for Text-to-Speech with Continuous Parameters

Vocoders received renewed attention as main components in statistical pa...
research
02/08/2016

LSTM Deep Neural Networks Postfiltering for Improving the Quality of Synthetic Voices

Recent developments in speech synthesis have produced systems capable of...
research
11/27/2018

Are 2D-LSTM really dead for offline text recognition?

There is a recent trend in handwritten text recognition with deep neural...
research
06/25/2018

Single-channel Speech Dereverberation via Generative Adversarial Training

In this paper, we propose a single-channel speech dereverberation system...

Please sign up or login with your details

Forgot password? Click here to reset