EE-TTS: Emphatic Expressive TTS with Linguistic Information

05/20/2023
by   Yi Zhong, et al.
0

While Current TTS systems perform well in synthesizing high-quality speech, producing highly expressive speech remains a challenge. Emphasis, as a critical factor in determining the expressiveness of speech, has attracted more attention nowadays. Previous works usually enhance the emphasis by adding intermediate features, but they can not guarantee the overall expressiveness of the speech. To resolve this matter, we propose Emphatic Expressive TTS (EE-TTS), which leverages multi-level linguistic information from syntax and semantics. EE-TTS contains an emphasis predictor that can identify appropriate emphasis positions from text and a conditioned acoustic model to synthesize expressive speech with emphasis and linguistic information. Experimental results indicate that EE-TTS outperforms baseline with MOS improvements of 0.49 and 0.67 in expressiveness and naturalness. EE-TTS also shows strong generalization across different datasets according to AB test results.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/25/2018

EMPHASIS: An Emotional Phoneme-based Acoustic Model for Speech Synthesis System

We present EMPHASIS, an emotional phoneme-based acoustic model for speec...
research
08/25/2023

Expressive paragraph text-to-speech synthesis with multi-step variational autoencoder

Neural networks have been able to generate high-quality single-sentence ...
research
02/16/2022

ProsoSpeech: Enhancing Prosody With Quantized Vector Pre-training in Text-to-Speech

Expressive text-to-speech (TTS) has become a hot research topic recently...
research
08/13/2021

Enhancing audio quality for expressive Neural Text-to-Speech

Artificial speech synthesis has made a great leap in terms of naturalnes...
research
05/21/2019

Effective parameter estimation methods for an ExcitNet model in generative text-to-speech systems

In this paper, we propose a high-quality generative text-to-speech (TTS)...
research
10/06/2021

Emphasis control for parallel neural TTS

The semantic information conveyed by a speech signal is strongly influen...
research
01/25/2023

A Holistic Cascade System, benchmark, and Human Evaluation Protocol for Expressive Speech-to-Speech Translation

Expressive speech-to-speech translation (S2ST) aims to transfer prosodic...

Please sign up or login with your details

Forgot password? Click here to reset