End-to-End Emotional Speech Synthesis Using Style Tokens and Semi-Supervised Training

06/26/2019
by   Peng-fei Wu, et al.
0

This paper proposes an end-to-end emotional speech synthesis (ESS) method which adopts global style tokens (GSTs) for semi-supervised training. This model is built based on the GST-Tacotron framework. The style tokens are defined to present emotion categories. A cross entropy loss function between token weights and emotion labels is designed to obtain the interpretability of style tokens utilizing the small portion of training data with emotion labels. Emotion recognition experiments confirm that this method can achieve one-to-one correspondence between style tokens and emotion categories effectively. Objective and subjective evaluation results show that our model outperforms the conventional Tacotron model for ESS when only 5% of training data has emotion labels. Its subjective performance is close to the Tacotron model trained using all emotion labels.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/08/2021

Cross-speaker Emotion Transfer Based on Speaker Condition Layer Normalization and Semi-Supervised Training in Text-To-Speech

In expressive speech synthesis, there are high requirements for emotion ...
research
11/05/2019

emotional speech synthesis with rich and granularized control

This paper proposes an effective emotion control method for an end-to-en...
research
06/13/2019

Adjusting Pleasure-Arousal-Dominance for Continuous Emotional Text-to-speech Synthesizer

Emotion is not limited to discrete categories of happy, sad, angry, fear...
research
03/23/2018

Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis

In this work, we propose "global style tokens" (GSTs), a bank of embeddi...
research
11/01/2017

Uncovering Latent Style Factors for Expressive Speech Synthesis

Prosodic modeling is a core problem in speech synthesis. The key challen...
research
11/15/2017

Emotional End-to-End Neural Speech Synthesizer

In this paper, we introduce an emotional speech synthesizer based on the...
research
10/03/2022

To Improve Is to Change: Towards Improving Mood Prediction by Learning Changes in Emotion

Although the terms mood and emotion are closely related and often used i...

Please sign up or login with your details

Forgot password? Click here to reset