Fine-grained Emotion Strength Transfer, Control and Prediction for Emotional Speech Synthesis

11/17/2020
by   Yi Lei, et al.
0

This paper proposes a unified model to conduct emotion transfer, control and prediction for sequence-to-sequence based fine-grained emotional speech synthesis. Conventional emotional speech synthesis often needs manual labels or reference audio to determine the emotional expressions of synthesized speech. Such coarse labels cannot control the details of speech emotion, often resulting in an averaged emotion expression delivery, and it is also hard to choose suitable reference audio during inference. To conduct fine-grained emotion expression generation, we introduce phoneme-level emotion strength representations through a learned ranking function to describe the local emotion details, and the sentence-level emotion category is adopted to render the global emotions of synthesized speech. With the global render and local descriptors of emotions, we can obtain fine-grained emotion expressions from reference audio via its emotion descriptors (for transfer) or directly from phoneme-level manual labels (for control). As for the emotional speech synthesis with arbitrary text inputs, the proposed model can also predict phoneme-level emotion expressions from texts, which does not require any reference audio or manual label.

READ FULL TEXT

page 5

page 6

research
01/17/2022

MsEmoTTS: Multi-scale emotion transfer, prediction, and control for emotional speech synthesis

Expressive synthetic speech is essential for many human-computer interac...
research
06/30/2022

Language Model-Based Emotion Prediction Methods for Emotional Speech Synthesis Systems

This paper proposes an effective emotional text-to-speech (TTS) system w...
research
06/27/2023

CASEIN: Cascading Explicit and Implicit Control for Fine-grained Emotion Intensity Regulation

Existing fine-grained intensity regulation methods rely on explicit cont...
research
03/29/2023

The secret of immersion: actor driven camera movement generation for auto-cinematography

Immersion plays a vital role when designing cinematic creations, yet the...
research
06/08/2023

Emotion and Sentiment Guided Paraphrasing

Paraphrase generation, a.k.a. paraphrasing, is a common and important ta...
research
10/07/2021

StrengthNet: Deep Learning-based Emotion Strength Assessment for Emotional Speech Synthesis

Recently, emotional speech synthesis has achieved remarkable performance...
research
10/26/2022

Multi-view Multi-label Fine-grained Emotion Decoding from Human Brain Activity

Decoding emotional states from human brain activity plays an important r...

Please sign up or login with your details

Forgot password? Click here to reset