A Diffeomorphic Flow-based Variational Framework for Multi-speaker Emotion Conversion

11/09/2022
by   Ravi Shankar, et al.
0

This paper introduces a new framework for non-parallel emotion conversion in speech. Our framework is based on two key contributions. First, we propose a stochastic version of the popular CycleGAN model. Our modified loss function introduces a Kullback Leibler (KL) divergence term that aligns the source and target data distributions learned by the generators, thus overcoming the limitations of sample wise generation. By using a variational approximation to this stochastic loss function, we show that our KL divergence term can be implemented via a paired density discriminator. We term this new architecture a variational CycleGAN (VCGAN). Second, we model the prosodic features of target emotion as a smooth and learnable deformation of the source prosodic features. This approach provides implicit regularization that offers key advantages in terms of better range alignment to unseen and out of distribution speakers. We conduct rigorous experiments and comparative studies to demonstrate that our proposed framework is fairly robust with high performance against several state-of-the-art baselines.

READ FULL TEXT

page 1

page 6

page 9

page 10

page 11

page 12

page 13

page 24

research
09/23/2022

A Jensen-Shannon Divergence Based Loss Function for Bayesian Neural Networks

Kullback-Leibler (KL) divergence is widely used for variational inferenc...
research
07/25/2020

Non-parallel Emotion Conversion using a Deep-Generative Hybrid Network and an Adversarial Pair Discriminator

We introduce a novel method for emotion conversion in speech that does n...
research
11/03/2018

Nonparallel Emotional Speech Conversion

We propose a nonparallel data-driven emotional speech conversion method....
research
09/14/2023

EMOCONV-DIFF: Diffusion-based Speech Emotion Conversion for Non-parallel and In-the-wild Data

Speech emotion conversion is the task of converting the expressed emotio...
research
02/21/2023

Nonparallel Emotional Voice Conversion For Unseen Speaker-Emotion Pairs Using Dual Domain Adversarial Network Virtual Domain Pairing

Primary goal of an emotional voice conversion (EVC) system is to convert...
research
06/18/2022

Redundancy Reduction Twins Network: A Training framework for Multi-output Emotion Regression

In this paper, we propose the Redundancy Reduction Twins Network (RRTN),...

Please sign up or login with your details

Forgot password? Click here to reset