Unsupervised word-level prosody tagging for controllable speech synthesis

02/15/2022
by   Yiwei Guo, et al.
0

Although word-level prosody modeling in neural text-to-speech (TTS) has been investigated in recent research for diverse speech synthesis, it is still challenging to control speech synthesis manually without a specific reference. This is largely due to lack of word-level prosody tags. In this work, we propose a novel approach for unsupervised word-level prosody tagging with two stages, where we first group the words into different types with a decision tree according to their phonetic content and then cluster the prosodies using GMM within each type of words separately. This design is based on the assumption that the prosodies of different type of words, such as long or short words, should be tagged with different label sets. Furthermore, a TTS system with the derived word-level prosody tags is trained for controllable speech synthesis. Experiments on LJSpeech show that the TTS model trained with word-level prosody tags not only achieves better naturalness than a typical FastSpeech2 model, but also gains the ability to manipulate word-level prosody.

READ FULL TEXT
research
09/01/2021

An Ensemble Approach for Annotating Source Code Identifiers with Part-of-speech Tags

This paper presents an ensemble part-of-speech tagging approach for sour...
research
07/04/2013

Constructing Hierarchical Image-tags Bimodal Representations for Word Tags Alternative Choice

This paper describes our solution to the multi-modal learning challenge ...
research
07/11/2022

PoeticTTS – Controllable Poetry Reading for Literary Studies

Speech synthesis for poetry is challenging due to specific intonation pa...
research
09/23/2022

Extending Word-Level Quality Estimation for Post-Editing Assistance

We define a novel concept called extended word alignment in order to imp...
research
05/27/2021

Diverse and Controllable Speech Synthesis with GMM-Based Phone-Level Prosody Modelling

Generating natural speech with diverse and smooth prosody pattern is a c...
research
05/31/2016

Fast Zero-Shot Image Tagging

The well-known word analogy experiments show that the recent word vector...

Please sign up or login with your details

Forgot password? Click here to reset