StarGAN-VC++: Towards Emotion Preserving Voice Conversion Using Deep Embeddings

09/14/2023
by   Arnab Das, et al.
0

Voice conversion (VC) transforms an utterance to sound like another person without changing the linguistic content. A recently proposed generative adversarial network-based VC method, StarGANv2-VC is very successful in generating natural-sounding conversions. However, the method fails to preserve the emotion of the source speaker in the converted samples. Emotion preservation is necessary for natural human-computer interaction. In this paper, we show that StarGANv2-VC fails to disentangle the speaker and emotion representations, pertinent to preserve emotion. Specifically, there is an emotion leakage from the reference audio used to capture the speaker embeddings while training. To counter the problem, we propose novel emotion-aware losses and an unsupervised method which exploits emotion supervision through latent emotion representations. The objective and subjective evaluations prove the efficacy of the proposed strategy over diverse datasets, emotions, gender, etc.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/14/2023

Emo-StarGAN: A Semi-Supervised Any-to-Many Non-Parallel Emotion-Preserving Voice Conversion

Speech anonymisation prevents misuse of spoken data by removing any pers...
research
02/21/2023

Nonparallel Emotional Voice Conversion For Unseen Speaker-Emotion Pairs Using Dual Domain Adversarial Network Virtual Domain Pairing

Primary goal of an emotional voice conversion (EVC) system is to convert...
research
01/10/2022

Emotion Intensity and its Control for Emotional Voice Conversion

Emotional voice conversion (EVC) seeks to convert the emotional state of...
research
10/25/2022

Mixed Emotion Modelling for Emotional Voice Conversion

Emotional voice conversion (EVC) aims to convert the emotional state of ...
research
07/18/2021

An Improved StarGAN for Emotional Voice Conversion: Enhancing Voice Quality and Data Augmentation

Emotional Voice Conversion (EVC) aims to convert the emotional style of ...
research
11/10/2022

EmoFake: An Initial Dataset for Emotion Fake Audio Detection

There are already some datasets used for fake audio detection, such as t...
research
09/14/2023

EMOCONV-DIFF: Diffusion-based Speech Emotion Conversion for Non-parallel and In-the-wild Data

Speech emotion conversion is the task of converting the expressed emotio...

Please sign up or login with your details

Forgot password? Click here to reset