EmoMix: Emotion Mixing via Diffusion Models for Emotional Speech Synthesis

06/01/2023
by   Haobin Tang, et al.
0

There has been significant progress in emotional Text-To-Speech (TTS) synthesis technology in recent years. However, existing methods primarily focus on the synthesis of a limited number of emotion types and have achieved unsatisfactory performance in intensity control. To address these limitations, we propose EmoMix, which can generate emotional speech with specified intensity or a mixture of emotions. Specifically, EmoMix is a controllable emotional TTS model based on a diffusion probabilistic model and a pre-trained speech emotion recognition (SER) model used to extract emotion embedding. Mixed emotion synthesis is achieved by combining the noises predicted by diffusion model conditioned on different emotions during only one sampling process at the run-time. We further apply the Neutral and specific primary emotion mixed in varying degrees to control intensity. Experimental results validate the effectiveness of EmoMix for synthesizing mixed emotion and intensity control.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/11/2022

Speech Synthesis with Mixed Emotions

Emotional speech synthesis aims to synthesize human voices with various ...
research
11/17/2022

EmoDiff: Intensity Controllable Emotional Text-to-Speech with Soft-Label Guidance

Although current neural text-to-speech (TTS) models are able to generate...
research
04/03/2021

Reinforcement Learning for Emotional Text-to-Speech Synthesis with Improved Emotion Discriminability

Emotional text-to-speech synthesis (ETTS) has seen much progress in rece...
research
06/28/2023

EmoSpeech: Guiding FastSpeech2 Towards Emotional Text to Speech

State-of-the-art speech synthesis models try to get as close as possible...
research
11/11/2022

Continuous Emotional Intensity Controllable Speech Synthesis using Semi-supervised Learning

With the rapid development of the speech synthesis system, recent text-t...
research
12/16/2020

How the emotion's type and intensity affect rumor spreading

The implication and contagion effect of emotion cannot be ignored in rum...
research
10/27/2022

Explicit Intensity Control for Accented Text-to-speech

Accented text-to-speech (TTS) synthesis seeks to generate speech with an...

Please sign up or login with your details

Forgot password? Click here to reset