Sampling-based speech parameter generation using moment-matching networks

04/12/2017
by   Shinnosuke Takamichi, et al.
0

This paper presents sampling-based speech parameter generation using moment-matching networks for Deep Neural Network (DNN)-based speech synthesis. Although people never produce exactly the same speech even if we try to express the same linguistic and para-linguistic information, typical statistical speech synthesis produces completely the same speech, i.e., there is no inter-utterance variation in synthetic speech. To give synthetic speech natural inter-utterance variation, this paper builds DNN acoustic models that make it possible to randomly sample speech parameters. The DNNs are trained so that they make the moments of generated speech parameters close to those of natural speech parameters. Since the variation of speech parameters is compressed into a low-dimensional simple prior noise vector, our algorithm has lower computation cost than direct sampling of speech parameters. As the first step towards generating synthetic speech that has natural inter-utterance variation, this paper investigates whether or not the proposed sampling-based generation deteriorates synthetic speech quality. In evaluation, we compare speech quality of conventional maximum likelihood-based generation and proposed sampling-based generation. The result demonstrates the proposed generation causes no degradation in speech quality.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/09/2019

Generative Moment Matching Network-based Random Modulation Post-filter for DNN-based Singing Voice Synthesis and Neural Double-tracking

This paper proposes a generative moment matching network (GMMN)-based po...
research
09/23/2017

Statistical Parametric Speech Synthesis Incorporating Generative Adversarial Networks

A method for statistical parametric speech synthesis incorporating gener...
research
02/22/2016

Improving Trajectory Modelling for DNN-based Speech Synthesis by using Stacked Bottleneck Features and Minimum Generation Error Training

We propose two novel techniques --- stacking bottleneck features and min...
research
03/14/2019

Generative adversarial network-based glottal waveform model for statistical parametric speech synthesis

Recent studies have shown that text-to-speech synthesis quality can be i...
research
07/10/2018

Phase reconstruction from amplitude spectrograms based on von-Mises-distribution deep neural network

This paper presents a deep neural network (DNN)-based phase reconstructi...
research
03/20/2022

Vocal effort modeling in neural TTS for improving the intelligibility of synthetic speech in noise

We present a neural text-to-speech (TTS) method that models natural voca...
research
07/06/2019

Towards Debugging Deep Neural Networks by Generating Speech Utterances

Deep neural networks (DNN) are able to successfully process and classify...

Please sign up or login with your details

Forgot password? Click here to reset