Language Model-Based Emotion Prediction Methods for Emotional Speech Synthesis Systems

06/30/2022
by   Hyun-Wook Yoon, et al.
0

This paper proposes an effective emotional text-to-speech (TTS) system with a pre-trained language model (LM)-based emotion prediction method. Unlike conventional systems that require auxiliary inputs such as manually defined emotion classes, our system directly estimates emotion-related attributes from the input text. Specifically, we utilize generative pre-trained transformer (GPT)-3 to jointly predict both an emotion class and its strength in representing emotions coarse and fine properties, respectively. Then, these attributes are combined in the emotional embedding space and used as conditional features of the TTS model for generating output speech signals. Consequently, the proposed system can produce emotional speech only from text without any auxiliary inputs. Furthermore, because the GPT-3 enables to capture emotional context among the consecutive sentences, the proposed method can effectively handle the paragraph-level generation of emotional speech.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/17/2020

Fine-grained Emotion Strength Transfer, Control and Prediction for Emotional Speech Synthesis

This paper proposes a unified model to conduct emotion transfer, control...
research
04/28/2018

Ladder Networks for Emotion Recognition: Using Unsupervised Auxiliary Tasks to Improve Predictions of Emotional Attributes

Recognizing emotions using few attribute dimensions such as arousal, val...
research
01/09/2021

Analysis of Statistical Parametric and Unit Selection Speech Synthesis Systems Applied to Emotional Speech

We have applied two state-of-the-art speech synthesis techniques (unit s...
research
11/05/2019

emotional speech synthesis with rich and granularized control

This paper proposes an effective emotion control method for an end-to-en...
research
06/29/2023

Learning Multilingual Expressive Speech Representation for Prosody Prediction without Parallel Data

We propose a method for speech-to-speech emotionpreserving translation t...
research
10/07/2021

StrengthNet: Deep Learning-based Emotion Strength Assessment for Emotional Speech Synthesis

Recently, emotional speech synthesis has achieved remarkable performance...
research
01/29/2023

Time out of Mind: Generating Rate of Speech conditioned on emotion and speaker

Voice synthesis has seen significant improvements in the past decade res...

Please sign up or login with your details

Forgot password? Click here to reset