Identity Conversion for Emotional Speakers: A Study for Disentanglement of Emotion Style and Speaker Identity

10/20/2021
by   Zongyang Du, et al.
0

Expressive voice conversion performs identity conversion for emotional speakers by jointly converting speaker identity and speaker-dependent emotion style. Due to the hierarchical structure of speech emotion, it is challenging to disentangle the speaker-dependent emotional style for expressive voice conversion. Motivated by the recent success on speaker disentanglement with variational autoencoder (VAE), we propose an expressive voice conversion framework which can effectively disentangle linguistic content, speaker identity, pitch, and emotional style information. We study the use of emotion encoder to model emotional style explicitly, and introduce mutual information (MI) losses to reduce the irrelevant information from the disentangled emotion representations. At run-time, our proposed framework can convert both speaker identity and speaker-dependent emotional style without the need for parallel data. Experimental results validate the effectiveness of our proposed framework in both objective and subjective evaluations.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/08/2021

Expressive Voice Conversion: A Joint Framework for Speaker Identity and Emotional Style Transfer

Traditional voice conversion(VC) has been focused on speaker identity co...
research
05/13/2020

Converting Anyone's Emotion: Towards Speaker-Independent Emotional Voice Conversion

Emotional voice conversion aims to convert the emotion of the speech fro...
research
03/31/2021

Limited Data Emotional Voice Conversion Leveraging Text-to-Speech: Two-stage Sequence-to-Sequence Training

Emotional voice conversion (EVC) aims to change the emotional state of a...
research
01/10/2022

Emotion Intensity and its Control for Emotional Voice Conversion

Emotional voice conversion (EVC) seeks to convert the emotional state of...
research
11/03/2018

Nonparallel Emotional Speech Conversion

We propose a nonparallel data-driven emotional speech conversion method....
research
10/29/2019

Disentangling Timbre and Singing Style with Multi-singer Singing Synthesis System

In this study, we define the identity of the singer with two independent...
research
02/21/2023

Nonparallel Emotional Voice Conversion For Unseen Speaker-Emotion Pairs Using Dual Domain Adversarial Network Virtual Domain Pairing

Primary goal of an emotional voice conversion (EVC) system is to convert...

Please sign up or login with your details

Forgot password? Click here to reset