Improving Mandarin Prosodic Structure Prediction with Multi-level Contextual Information

08/31/2023
by   Jie Chen, et al.
0

For text-to-speech (TTS) synthesis, prosodic structure prediction (PSP) plays an important role in producing natural and intelligible speech. Although inter-utterance linguistic information can influence the speech interpretation of the target utterance, previous works on PSP mainly focus on utilizing intrautterance linguistic information of the current utterance only. This work proposes to use inter-utterance linguistic information to improve the performance of PSP. Multi-level contextual information, which includes both inter-utterance and intrautterance linguistic information, is extracted by a hierarchical encoder from character level, utterance level and discourse level of the input text. Then a multi-task learning (MTL) decoder predicts prosodic boundaries from multi-level contextual information. Objective evaluation results on two datasets show that our method achieves better F1 scores in predicting prosodic word (PW), prosodic phrase (PPH) and intonational phrase (IPH). It demonstrates the effectiveness of using multi-level contextual information for PSP. Subjective preference tests also indicate the naturalness of synthesized speeches are improved.

READ FULL TEXT
research
03/23/2022

Towards Expressive Speaking Style Modelling with Hierarchical Context Information for Mandarin Speech Synthesis

Previous works on expressive speech synthesis mainly focus on current se...
research
06/02/2021

Attention-based Contextual Language Model Adaptation for Speech Recognition

Language modeling (LM) for automatic speech recognition (ASR) does not u...
research
04/12/2022

Open-set Text Recognition via Character-Context Decoupling

The open-set text recognition task is an emerging challenge that require...
research
06/18/2018

A Weighted Superposition of Functional Contours Model for Modelling Contextual Prominence of Elementary Prosodic Contours

The way speech prosody encodes linguistic, paralinguistic and non-lingui...
research
11/05/2019

Improving Slot Filling by Utilizing Contextual Information

Slot Filling is the task of extracting the semantic concept from a given...
research
07/04/2021

EditSpeech: A Text Based Speech Editing System Using Partial Inference and Bidirectional Fusion

This paper presents the design, implementation and evaluation of a speec...
research
09/14/2022

ParaTTS: Learning Linguistic and Prosodic Cross-sentence Information in Paragraph-based TTS

Recent advancements in neural end-to-end TTS models have shown high-qual...

Please sign up or login with your details

Forgot password? Click here to reset