Non-parallel Emotion Conversion using a Deep-Generative Hybrid Network and an Adversarial Pair Discriminator

07/25/2020
by   Ravi Shankar, et al.
0

We introduce a novel method for emotion conversion in speech that does not require parallel training data. Our approach loosely relies on a cycle-GAN schema to minimize the reconstruction error from converting back and forth between emotion pairs. However, unlike the conventional cycle-GAN, our discriminator classifies whether a pair of input real and generated samples corresponds to the desired emotion conversion (e.g., A to B) or to its inverse (B to A). We will show that this setup, which we refer to as a variational cycle-GAN (VC-GAN), is equivalent to minimizing the empirical KL divergence between the source features and their cyclic counterpart. In addition, our generator combines a trainable deep network with a fixed generative block to implement a smooth and invertible transformation on the input features, in our case, the fundamental frequency (F0) contour. This hybrid architecture regularizes our adversarial training procedure. We use crowd sourcing to evaluate both the emotional saliency and the quality of synthesized speech. Finally, we show that our model generalizes to new speakers by modifying speech produced by Wavenet.

READ FULL TEXT

page 2

page 4

research
11/03/2020

VAW-GAN for Disentanglement and Recomposition of Emotional Elements in Speech

Emotional voice conversion (EVC) aims to convert the emotion of speech f...
research
05/27/2019

EG-GAN: Cross-Language Emotion Gain Synthesis based on Cycle-Consistent Adversarial Networks

Despite remarkable contributions from existing emotional speech synthesi...
research
11/09/2022

A Diffeomorphic Flow-based Variational Framework for Multi-speaker Emotion Conversion

This paper introduces a new framework for non-parallel emotion conversio...
research
08/18/2020

CinC-GAN for Effective F0 prediction for Whisper-to-Normal Speech Conversion

Recently, Generative Adversarial Networks (GAN)-based methods have shown...
research
02/21/2023

Nonparallel Emotional Voice Conversion For Unseen Speaker-Emotion Pairs Using Dual Domain Adversarial Network Virtual Domain Pairing

Primary goal of an emotional voice conversion (EVC) system is to convert...
research
01/10/2020

Improving Dysarthric Speech Intelligibility Using Cycle-consistent Adversarial Training

Dysarthria is a motor speech impairment affecting millions of people. Dy...

Please sign up or login with your details

Forgot password? Click here to reset