Multi-task WaveNet: A Multi-task Generative Model for Statistical Parametric Speech Synthesis without Fundamental Frequency Conditions

06/22/2018
by   Yu Gu, et al.
0

This paper introduces an improved generative model for statistical parametric speech synthesis (SPSS) based on WaveNet under a multi-task learning framework. Different from the original WaveNet model, the proposed Multi-task WaveNet employs the frame-level acoustic feature prediction as the secondary task and the external fundamental frequency prediction model for the original WaveNet can be removed. Therefore the improved WaveNet can generate high-quality speech waveforms only conditioned on linguistic features. Multi-task WaveNet can produce more natural and expressive speech by addressing the pitch prediction error accumulation issue and possesses more succinct inference procedures than the original WaveNet. Experimental results prove that the SPSS method proposed in this paper can achieve better performance than the state-of-the-art approach utilizing the original WaveNet in both objective and subjective preference tests.

READ FULL TEXT
research
06/23/2022

Adversarial Multi-Task Learning for Disentangling Timbre and Pitch in Singing Voice Synthesis

Recently, deep learning-based generative models have been introduced to ...
research
07/06/2017

Statistical Parametric Speech Synthesis Using Generative Adversarial Networks Under A Multi-task Learning Framework

In this paper, we aim at improving the performance of synthesized speech...
research
09/18/2023

Utilizing Whisper to Enhance Multi-Branched Speech Intelligibility Prediction Model for Hearing Aids

Automated assessment of speech intelligibility in hearing aid (HA) devic...
research
08/11/2020

Modeling Prosodic Phrasing with Multi-Task Learning in Tacotron-based TTS

Tacotron-based end-to-end speech synthesis has shown remarkable voice qu...
research
06/25/2018

EMPHASIS: An Emotional Phoneme-based Acoustic Model for Speech Synthesis System

We present EMPHASIS, an emotional phoneme-based acoustic model for speec...
research
11/30/2018

Advance Prediction of Ventricular Tachyarrhythmias using Patient Metadata and Multi-Task Networks

We describe a novel neural network architecture for the prediction of ve...
research
04/06/2019

An Integrated Approach for Keyphrase Generation via Exploring the Power of Retrieval and Extraction

In this paper, we present a novel integrated approach for keyphrase gene...

Please sign up or login with your details

Forgot password? Click here to reset