Fine-grained Emotional Control of Text-To-Speech: Learning To Rank Inter- And Intra-Class Emotion Intensities

03/02/2023
by   Shijun Wang, et al.
6

State-of-the-art Text-To-Speech (TTS) models are capable of producing high-quality speech. The generated speech, however, is usually neutral in emotional expression, whereas very often one would want fine-grained emotional control of words or phonemes. Although still challenging, the first TTS models have been recently proposed that are able to control voice by manually assigning emotion intensity. Unfortunately, due to the neglect of intra-class distance, the intensity differences are often unrecognizable. In this paper, we propose a fine-grained controllable emotional TTS, that considers both inter- and intra-class distances and be able to synthesize speech with recognizable intensity difference. Our subjective and objective experiments demonstrate that our model exceeds two state-of-the-art controllable TTS models for controllability, emotion expressiveness and naturalness.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/17/2022

EmoDiff: Intensity Controllable Emotional Text-to-Speech with Soft-Label Guidance

Although current neural text-to-speech (TTS) models are able to generate...
research
07/30/2018

Deep Encoder-Decoder Models for Unsupervised Learning of Controllable Speech Synthesis

Generating versatile and appropriate synthetic speech requires control o...
research
11/05/2019

emotional speech synthesis with rich and granularized control

This paper proposes an effective emotion control method for an end-to-en...
research
03/14/2023

QI-TTS: Questioning Intonation Control for Emotional Speech Synthesis

Recent expressive text to speech (TTS) models focus on synthesizing emot...
research
06/27/2023

CASEIN: Cascading Explicit and Implicit Control for Fine-grained Emotion Intensity Regulation

Existing fine-grained intensity regulation methods rely on explicit cont...
research
09/22/2022

Controllable Accented Text-to-Speech Synthesis

Accented text-to-speech (TTS) synthesis seeks to generate speech with an...
research
02/27/2023

SpeechFormer++: A Hierarchical Efficient Framework for Paralinguistic Speech Processing

Paralinguistic speech processing is important in addressing many issues,...

Please sign up or login with your details

Forgot password? Click here to reset