Reinforcement Learning for Emotional Text-to-Speech Synthesis with Improved Emotion Discriminability

04/03/2021
by   Rui Liu, et al.
0

Emotional text-to-speech synthesis (ETTS) has seen much progress in recent years. However, the generated voice is often not perceptually identifiable by its intended emotion category. To address this problem, we propose a new interactive training paradigm for ETTS, denoted as i-ETTS, which seeks to directly improve the emotion discriminability by interacting with a speech emotion recognition (SER) model. Moreover, we formulate an iterative training strategy with reinforcement learning to ensure the quality of i-ETTS optimization. Experimental results demonstrate that the proposed i-ETTS outperforms the state-of-the-art baselines by rendering speech with more accurate emotion style. To our best knowledge, this is the first study of reinforcement learning in emotional text-to-speech synthesis.

READ FULL TEXT
research
06/01/2023

EmoMix: Emotion Mixing via Diffusion Models for Emotional Speech Synthesis

There has been significant progress in emotional Text-To-Speech (TTS) sy...
research
06/13/2019

Adjusting Pleasure-Arousal-Dominance for Continuous Emotional Text-to-speech Synthesizer

Emotion is not limited to discrete categories of happy, sad, angry, fear...
research
01/10/2023

Generative Emotional AI for Speech Emotion Recognition: The Case for Synthetic Emotional Speech Augmentation

Despite advances in deep learning, current state-of-the-art speech emoti...
research
06/25/2018

The Emotional Voices Database: Towards Controlling the Emotion Dimension in Voice Generation Systems

In this paper, we present a database of emotional speech intended to be ...
research
01/29/2023

Time out of Mind: Generating Rate of Speech conditioned on emotion and speaker

Voice synthesis has seen significant improvements in the past decade res...
research
04/15/2020

Mirror Ritual: Human-Machine Co-Construction of Emotion

Mirror Ritual is an interactive installation that challenges the existin...
research
07/05/2023

Going Retro: Astonishingly Simple Yet Effective Rule-based Prosody Modelling for Speech Synthesis Simulating Emotion Dimensions

We introduce two rule-based models to modify the prosody of speech synth...

Please sign up or login with your details

Forgot password? Click here to reset