Towards Multi-Scale Speaking Style Modelling with Hierarchical Context Information for Mandarin Speech Synthesis

04/06/2022
by   Shun Lei, et al.
0

Previous works on expressive speech synthesis focus on modelling the mono-scale style embedding from the current sentence or context, but the multi-scale nature of speaking style in human speech is neglected. In this paper, we propose a multi-scale speaking style modelling method to capture and predict multi-scale speaking style for improving the naturalness and expressiveness of synthetic speech. A multi-scale extractor is proposed to extract speaking style embeddings at three different levels from the ground-truth speech, and explicitly guide the training of a multi-scale style predictor based on hierarchical context information. Both objective and subjective evaluations on a Mandarin audiobooks dataset demonstrate that our proposed method can significantly improve the naturalness and expressiveness of the synthesized speech.

READ FULL TEXT
research
07/29/2023

MSStyleTTS: Multi-Scale Style Modeling with Hierarchical Context Information for Expressive Speech Synthesis

Expressive speech synthesis is crucial for many human-computer interacti...
research
03/23/2022

Towards Expressive Speaking Style Modelling with Hierarchical Context Information for Mandarin Speech Synthesis

Previous works on expressive speech synthesis mainly focus on current se...
research
04/13/2023

Context-aware Coherent Speaking Style Prediction with Hierarchical Transformers for Audiobook Speech Synthesis

Recent advances in text-to-speech have significantly improved the expres...
research
06/29/2021

Multi-Scale Spectrogram Modelling for Neural Text-to-Speech

We propose a novel Multi-Scale Spectrogram (MSS) modelling approach to s...
research
11/04/2022

Improving Speech Prosody of Audiobook Text-to-Speech Synthesis with Acoustic and Textual Contexts

We present a multi-speaker Japanese audiobook text-to-speech (TTS) syste...
research
06/25/2022

Self-supervised Context-aware Style Representation for Expressive Speech Synthesis

Expressive speech synthesis, like audiobook synthesis, is still challeng...
research
03/31/2022

Manipulation of oral cancer speech using neural articulatory synthesis

We present an articulatory synthesis framework for the synthesis and man...

Please sign up or login with your details

Forgot password? Click here to reset