Cross-speaker Emotion Transfer Based on Speaker Condition Layer Normalization and Semi-Supervised Training in Text-To-Speech

10/08/2021
by   Pengfei Wu, et al.
0

In expressive speech synthesis, there are high requirements for emotion interpretation. However, it is time-consuming to acquire emotional audio corpus for arbitrary speakers due to their deduction ability. In response to this problem, this paper proposes a cross-speaker emotion transfer method that can realize the transfer of emotions from source speaker to target speaker. A set of emotion tokens is firstly defined to represent various categories of emotions. They are trained to be highly correlated with corresponding emotions for controllable synthesis by cross-entropy loss and semi-supervised training strategy. Meanwhile, to eliminate the down-gradation to the timbre similarity from cross-speaker emotion transfer, speaker condition layer normalization is implemented to model speaker characteristics. Experimental results show that the proposed method outperforms the multi-reference based baseline in terms of timbre similarity, stability and emotion perceive evaluations.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/29/2022

iEmoTTS: Toward Robust Cross-Speaker Emotion Transfer and Control for Speech Synthesis based on Disentanglement between Prosody and Timbre

The capability of generating speech with specific type of emotion is des...
research
06/26/2019

End-to-End Emotional Speech Synthesis Using Style Tokens and Semi-Supervised Training

This paper proposes an end-to-end emotional speech synthesis (ESS) metho...
research
03/15/2023

Cross-speaker Emotion Transfer by Manipulating Speech Style Latents

In recent years, emotional text-to-speech has shown considerable progres...
research
10/08/2020

Emotion Invariant Speaker Embeddings for Speaker Identification with Emotional Speech

Emotional state of a speaker is found to have significant effect in spee...
research
07/05/2022

A cross-corpus study on speech emotion recognition

For speech emotion datasets, it has been difficult to acquire large quan...
research
11/19/2022

Multi-Speaker Expressive Speech Synthesis via Multiple Factors Decoupling

This paper aims to synthesize target speaker's speech with desired speak...
research
05/27/2019

EG-GAN: Cross-Language Emotion Gain Synthesis based on Cycle-Consistent Adversarial Networks

Despite remarkable contributions from existing emotional speech synthesi...

Please sign up or login with your details

Forgot password? Click here to reset