Improving Prosody for Unseen Texts in Speech Synthesis by Utilizing Linguistic Information and Noisy Data

11/15/2021
by   Zhu Li, et al.
0

Recent advancements in end-to-end speech synthesis have made it possible to generate highly natural speech. However, training these models typically requires a large amount of high-fidelity speech data, and for unseen texts, the prosody of synthesized speech is relatively unnatural. To address these issues, we propose to combine a fine-tuned BERT-based front-end with a pre-trained FastSpeech2-based acoustic model to improve prosody modeling. The pre-trained BERT is fine-tuned on the polyphone disambiguation task, the joint Chinese word segmentation (CWS) and part-of-speech (POS) tagging task, and the prosody structure prediction (PSP) task in a multi-task learning framework. FastSpeech 2 is pre-trained on large-scale external data that are noisy but easier to obtain. Experimental results show that both the fine-tuned BERT model and the pre-trained FastSpeech 2 can improve prosody, especially for those structurally complex sentences.

READ FULL TEXT

page 1

page 2

page 3

page 4

page 5

research
03/28/2021

PnG BERT: Augmented BERT on Phonemes and Graphemes for Neural TTS

This paper introduces PnG BERT, a new encoder model for neural TTS. This...
research
08/31/2023

DictaBERT: A State-of-the-Art BERT Suite for Modern Hebrew

We present DictaBERT, a new state-of-the-art pre-trained BERT model for ...
research
02/22/2022

Improving CTC-based speech recognition via knowledge transferring from pre-trained language models

Recently, end-to-end automatic speech recognition models based on connec...
research
12/16/2022

Investigation of Japanese PnG BERT language model in text-to-speech synthesis for pitch accent language

End-to-end text-to-speech synthesis (TTS) can generate highly natural sy...
research
03/14/2022

Can pre-trained Transformers be used in detecting complex sensitive sentences? – A Monsanto case study

Each and every organisation releases information in a variety of forms r...
research
05/29/2020

Stance Prediction for Contemporary Issues: Data and Experiments

We investigate whether pre-trained bidirectional transformers with senti...
research
11/10/2022

Assistive Completion of Agrammatic Aphasic Sentences: A Transfer Learning Approach using Neurolinguistics-based Synthetic Dataset

Damage to the inferior frontal gyrus (Broca's area) can cause agrammatic...

Please sign up or login with your details

Forgot password? Click here to reset