Mixture Density Network for Phone-Level Prosody Modelling in Speech Synthesis

02/01/2021
by   Chenpeng Du, et al.
0

Recent researches on both utterance-level and phone-level prosody modelling successfully improve the voice quality and naturalness in text-to-speech synthesis. However, most of them model the prosody with a unimodal distribution such like a single Gaussian, which is not reasonable enough. In this work, we focus on phone-level prosody modelling where we introduce a Gaussian mixture model(GMM) based mixture density network. Our experiments on the LJSpeech dataset demonstrate that GMM can better model the phone-level prosody than a single Gaussian. The subjective evaluations suggest that our method not only significantly improves the prosody diversity in synthetic speech without the need of manual control, but also achieves a better naturalness. We also find that using the additional mixture density network has only very limited influence on inference speed.

READ FULL TEXT
research
05/27/2021

Diverse and Controllable Speech Synthesis with GMM-Based Phone-Level Prosody Modelling

Generating natural speech with diverse and smooth prosody pattern is a c...
research
06/07/2020

Analysis and Synthesis of Hypo and Hyperarticulated Speech

This paper focuses on the analysis and synthesis of hypo and hyperarticu...
research
05/06/2022

Comparison of continuity equation and Gaussian mixture model for long-term density propagation using semi-analytical methods

This paper compares the continuum evolution for density equation modelli...
research
12/10/2019

3D-GMNet: Learning to Estimate 3D Shape from A Single Image As A Gaussian Mixture

In this paper, we introduce 3D-GMNet, a deep neural network for single-i...
research
03/28/2016

Hierarchical Gaussian Mixture Model with Objects Attached to Terminal and Non-terminal Dendrogram Nodes

A hierarchical clustering algorithm based on Gaussian mixture model is p...
research
07/28/2018

Point Process Models for Distribution of Cell Phone Antennas

We introduce a model for the spatial distribution of cell phone antennas...
research
11/13/2022

OverFlow: Putting flows on top of neural transducers for better TTS

Neural HMMs are a type of neural transducer recently proposed for sequen...

Please sign up or login with your details

Forgot password? Click here to reset