Emo-StarGAN: A Semi-Supervised Any-to-Many Non-Parallel Emotion-Preserving Voice Conversion

09/14/2023
by   Suhita Ghosh, et al.
0

Speech anonymisation prevents misuse of spoken data by removing any personal identifier while preserving at least linguistic content. However, emotion preservation is crucial for natural human-computer interaction. The well-known voice conversion technique StarGANv2-VC achieves anonymisation but fails to preserve emotion. This work presents an any-to-many semi-supervised StarGANv2-VC variant trained on partially emotion-labelled non-parallel data. We propose emotion-aware losses computed on the emotion embeddings and acoustic features correlated to emotion. Additionally, we use an emotion classifier to provide direct emotion supervision. Objective and subjective evaluations show that the proposed approach significantly improves emotion preservation over the vanilla StarGANv2-VC. This considerable improvement is seen over diverse datasets, emotions, target speakers, and inter-group conversions without compromising intelligibility and anonymisation.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/14/2023

StarGAN-VC++: Towards Emotion Preserving Voice Conversion Using Deep Embeddings

Voice conversion (VC) transforms an utterance to sound like another pers...
research
09/14/2023

EMOCONV-DIFF: Diffusion-based Speech Emotion Conversion for Non-parallel and In-the-wild Data

Speech emotion conversion is the task of converting the expressed emotio...
research
09/30/2019

Semi-supervised voice conversion with amortized variational inference

In this work we introduce a semi-supervised approach to the voice conver...
research
06/02/2023

In-the-wild Speech Emotion Conversion Using Disentangled Self-Supervised Representations and Neural Vocoder-based Resynthesis

Speech emotion conversion aims to convert the expressed emotion of a spo...
research
08/13/2017

Semi-supervised emotion lexicon expansion with label propagation and specialized word embeddings

There exist two main approaches to automatically extract affective orien...
research
01/14/2021

EmoCat: Language-agnostic Emotional Voice Conversion

Emotional voice conversion models adapt the emotion in speech without ch...
research
12/07/2022

Analysis and Utilization of Entrainment on Acoustic and Emotion Features in User-agent Dialogue

Entrainment is the phenomenon by which an interlocutor adapts their spea...

Please sign up or login with your details

Forgot password? Click here to reset