The goal of Automatic Voice Over (AVO) is to generate speech in sync wit...
Deep Learning (DL) models have been popular nowadays to execute differen...
Speech Emotion Recognition (SER) is a critical enabler of emotion-aware
...
Most current audio-visual emotion recognition models lack the flexibilit...
Text-to-speech (TTS) models have achieved remarkable naturalness in rece...
Accent plays a significant role in speech communication, influencing
und...
Emotional voice conversion (EVC) aims to convert the emotional state of ...
Neural models are known to be over-parameterized, and recent work has sh...
Accented text-to-speech (TTS) synthesis seeks to generate speech with an...
Emotional speech synthesis aims to synthesize human voices with various
...
Emotion classification of speech and assessment of the emotion strength ...
Emotional voice conversion (EVC) seeks to convert the emotional state of...
Expressive voice conversion performs identity conversion for emotional
s...
Conventional vocoders are commonly used as analysis tools to provide
int...
In this paper, we formulate a novel task to synthesize speech in sync wi...
Recently, emotional speech synthesis has achieved remarkable performance...
Traditional voice conversion(VC) has been focused on speaker identity
co...
In this paper, we first provide a review of the state-of-the-art emotion...
Emotional text-to-speech synthesis (ETTS) has seen much progress in rece...
Emotional voice conversion (EVC) aims to change the emotional state of a...
Emotional voice conversion (EVC) aims to convert the emotion of speech f...
Emotional voice conversion aims to transform emotional prosody in speech...
Attention-based end-to-end text-to-speech synthesis (TTS) is superior to...
Tacotron-based end-to-end speech synthesis has shown remarkable voice
qu...
Cross-lingual voice conversion aims to change source speaker's voice to ...
Singing voice conversion aims to convert singer's voice from source to t...
Speaker identity is one of the important characteristics of human speech...
We propose a novel training strategy for Tacotron-based text-to-speech (...
Emotional voice conversion aims to convert the emotion of the speech fro...
Tacotron-based text-to-speech (TTS) systems directly synthesize speech f...
Emotional voice conversion is to convert the spectrum and prosody to cha...
While neural end-to-end text-to-speech (TTS) is superior to conventional...
We describe our submitted system for the ZeroSpeech Challenge 2019. The
...