Expressive Voice Conversion: A Joint Framework for Speaker Identity and Emotional Style Transfer

07/08/2021
by   Zongyang Du, et al.
0

Traditional voice conversion(VC) has been focused on speaker identity conversion for speech with a neutral expression. We note that emotional expression plays an essential role in daily communication, and the emotional style of speech can be speaker-dependent. In this paper, we study the technique to jointly convert the speaker identity and speaker-dependent emotional style, that is called expressive voice conversion. We propose a StarGAN-based framework to learn a many-to-many mapping across different speakers, that takes into account speaker-dependent emotional style without the need for parallel data. To achieve this, we condition the generator on emotional style encoding derived from a pre-trained speech emotion recognition(SER) model. The experiments validate the effectiveness of our proposed framework in both objective and subjective evaluations. To our best knowledge, this is the first study on expressive voice conversion.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/20/2021

Identity Conversion for Emotional Speakers: A Study for Disentanglement of Emotion Style and Speaker Identity

Expressive voice conversion performs identity conversion for emotional s...
research
04/05/2021

StarGAN-based Emotional Voice Conversion for Japanese Phrases

This paper shows that StarGAN-VC, a spectral envelope transformation met...
research
12/14/2022

Disentangling Prosody Representations with Unsupervised Speech Reconstruction

Human speech can be characterized by different components, including sem...
research
05/10/2021

MASS: Multi-task Anthropomorphic Speech Synthesis Framework

Text-to-Speech (TTS) synthesis plays an important role in human-computer...
research
02/10/2022

Cross-speaker style transfer for text-to-speech using data augmentation

We address the problem of cross-speaker style transfer for text-to-speec...
research
02/01/2020

Transforming Spectrum and Prosody for Emotional Voice Conversion with Non-Parallel Training Data

Emotional voice conversion is to convert the spectrum and prosody to cha...
research
11/30/2021

CycleTransGAN-EVC: A CycleGAN-based Emotional Voice Conversion Model with Transformer

In this study, we explore the transformer's ability to capture intra-rel...

Please sign up or login with your details

Forgot password? Click here to reset