F0 Modeling In Hmm-Based Speech Synthesis System Using Deep Belief Network

02/18/2015
by   Sankar Mukherjee, et al.
0

In recent years multilayer perceptrons (MLPs) with many hid- den layers Deep Neural Network (DNN) has performed sur- prisingly well in many speech tasks, i.e. speech recognition, speaker verification, speech synthesis etc. Although in the context of F0 modeling these techniques has not been ex- ploited properly. In this paper, Deep Belief Network (DBN), a class of DNN family has been employed and applied to model the F0 contour of synthesized speech which was generated by HMM-based speech synthesis system. The experiment was done on Bengali language. Several DBN-DNN architectures ranging from four to seven hidden layers and up to 200 hid- den units per hidden layer was presented and evaluated. The results were compared against clustering tree techniques pop- ularly found in statistical parametric speech synthesis. We show that from textual inputs DBN-DNN learns a high level structure which in turn improves F0 contour in terms of ob- jective and subjective tests.

READ FULL TEXT
research
05/26/2020

A comparison of Vietnamese Statistical Parametric Speech Synthesis Systems

In recent years, statistical parametric speech synthesis (SPSS) systems ...
research
05/08/2018

A Regression Model of Recurrent Deep Neural Networks for Noise Robust Estimation of the Fundamental Frequency Contour of Speech

The fundamental frequency (F0) contour of speech is a key aspect to repr...
research
10/22/2017

Deep Triphone Embedding Improves Phoneme Recognition

In this paper, we present a novel Deep Triphone Embedding (DTE) represen...
research
11/17/2016

Automatic Node Selection for Deep Neural Networks using Group Lasso Regularization

We examine the effect of the Group Lasso (gLasso) regularizer in selecti...
research
01/26/2016

Recurrent Neural Network Postfilters for Statistical Parametric Speech Synthesis

In the last two years, there have been numerous papers that have looked ...
research
02/09/2019

Generative Moment Matching Network-based Random Modulation Post-filter for DNN-based Singing Voice Synthesis and Neural Double-tracking

This paper proposes a generative moment matching network (GMMN)-based po...
research
07/28/2018

Analysing Shortcomings of Statistical Parametric Speech Synthesis

Output from statistical parametric speech synthesis (SPSS) remains notic...

Please sign up or login with your details

Forgot password? Click here to reset