An Overview Analysis of Sequence-to-Sequence Emotional Voice Conversion

03/29/2022
by   Zijiang Yang, et al.
0

Emotional voice conversion (EVC) focuses on converting a speech utterance from a source to a target emotion; it can thus be a key enabling technology for human-computer interaction applications and beyond. However, EVC remains an unsolved research problem with several challenges. In particular, as speech rate and rhythm are two key factors of emotional conversion, models have to generate output sequences of differing length. Sequence-to-sequence modelling is recently emerging as a competitive paradigm for models that can overcome those challenges. In an attempt to stimulate further research in this promising new direction, recent sequence-to-sequence EVC papers were systematically investigated and reviewed from six perspectives: their motivation, training strategies, model architectures, datasets, model inputs, and evaluation methods. This information is organised to provide the research community with an easily digestible overview of the current state-of-the-art. Finally, we discuss existing challenges of sequence-to-sequence EVC.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/31/2021

Limited Data Emotional Voice Conversion Leveraging Text-to-Speech: Two-stage Sequence-to-Sequence Training

Emotional voice conversion (EVC) aims to change the emotional state of a...
research
10/25/2022

Mixed Emotion Modelling for Emotional Voice Conversion

Emotional voice conversion (EVC) aims to convert the emotional state of ...
research
01/06/2020

Mel-spectrogram augmentation for sequence to sequence voice conversion

When training the sequence-to-sequence voice conversion model, we need t...
research
02/21/2019

Lingvo: a Modular and Scalable Framework for Sequence-to-Sequence Modeling

Lingvo is a Tensorflow framework offering a complete solution for collab...
research
11/11/2019

Emotional Voice Conversion using multitask learning with Text-to-speech

Voice conversion (VC) is a task to transform a person's voice to differe...
research
08/09/2018

Rhythm-Flexible Voice Conversion without Parallel Data Using Cycle-GAN over Phoneme Posteriorgram Sequences

Speaking rate refers to the average number of phonemes within some unit ...
research
11/09/2018

AttS2S-VC: Sequence-to-Sequence Voice Conversion with Attention and Context Preservation Mechanisms

This paper describes a method based on a sequence-to-sequence learning (...

Please sign up or login with your details

Forgot password? Click here to reset