Emotion Intensity and its Control for Emotional Voice Conversion

01/10/2022
by   Kun Zhou, et al.
6

Emotional voice conversion (EVC) seeks to convert the emotional state of an utterance while preserving the linguistic content and speaker identity. In EVC, emotions are usually treated as discrete categories overlooking the fact that speech also conveys emotions with various intensity levels that the listener can perceive. In this paper, we aim to explicitly characterize and control the intensity of emotion. We propose to disentangle the speaker style from linguistic content and encode the speaker style into a style embedding in a continuous space that forms the prototype of emotion embedding. We further learn the actual emotion encoder from an emotion-labelled database and study the use of relative attributes to represent fine-grained emotion intensity. To ensure emotional intelligibility, we incorporate emotion classification loss and emotion embedding similarity loss into the training of the EVC network. As desired, the proposed network controls the fine-grained emotion intensity in the output speech. Through both objective and subjective evaluations, we validate the effectiveness of the proposed network for emotional expressiveness and emotion intensity control.

READ FULL TEXT

page 10

page 12

page 13

page 17

research
10/20/2021

Identity Conversion for Emotional Speakers: A Study for Disentanglement of Emotion Style and Speaker Identity

Expressive voice conversion performs identity conversion for emotional s...
research
03/14/2023

QI-TTS: Questioning Intonation Control for Emotional Speech Synthesis

Recent expressive text to speech (TTS) models focus on synthesizing emot...
research
10/25/2022

Mixed Emotion Modelling for Emotional Voice Conversion

Emotional voice conversion (EVC) aims to convert the emotional state of ...
research
03/02/2022

U-Singer: Multi-Singer Singing Voice Synthesizer that Controls Emotional Intensity

We propose U-Singer, the first multi-singer emotional singing voice synt...
research
01/14/2021

EmoCat: Language-agnostic Emotional Voice Conversion

Emotional voice conversion models adapt the emotion in speech without ch...
research
08/16/2023

AffectEcho: Speaker Independent and Language-Agnostic Emotion and Affect Transfer for Speech Synthesis

Affect is an emotional characteristic encompassing valence, arousal, and...
research
09/14/2023

StarGAN-VC++: Towards Emotion Preserving Voice Conversion Using Deep Embeddings

Voice conversion (VC) transforms an utterance to sound like another pers...

Please sign up or login with your details

Forgot password? Click here to reset