EMPHASIS: An Emotional Phoneme-based Acoustic Model for Speech Synthesis System

06/25/2018
by   Hao Li, et al.
0

We present EMPHASIS, an emotional phoneme-based acoustic model for speech synthesis system. EMPHASIS includes a phoneme duration prediction model and an acoustic parameter prediction model. It uses a CBHG-based regression network to model the dependencies between linguistic features and acoustic features. We modify the input and output layer structures of the network to improve the performance. For the linguistic features, we apply a feature grouping strategy to enhance emotional and prosodic features. The acoustic parameters are designed to be suitable for the regression task and waveform reconstruction. EMPHASIS can synthesize speech in real-time and generate expressive interrogative and exclamatory speech with high audio quality. EMPHASIS is designed to be a multi-lingual model and can synthesize Mandarin-English speech for now. In the experiment of emotional speech synthesis, it achieves better subjective results than other real-time speech synthesis systems.

READ FULL TEXT
research
05/20/2023

EE-TTS: Emphatic Expressive TTS with Linguistic Information

While Current TTS systems perform well in synthesizing high-quality spee...
research
08/03/2020

Audiovisual Speech Synthesis using Tacotron2

Audiovisual speech synthesis is the problem of synthesizing a talking fa...
research
10/06/2021

Emphasis control for parallel neural TTS

The semantic information conveyed by a speech signal is strongly influen...
research
09/14/2023

DDSP-based Neural Waveform Synthesis of Polyphonic Guitar Performance from String-wise MIDI Input

We explore the use of neural synthesis for acoustic guitar from string-w...
research
04/13/2021

Detecting Escalation Level from Speech with Transfer Learning and Acoustic-Lexical Information Fusion

Textual escalation detection has been widely applied to e-commerce compa...
research
10/11/2021

LaughNet: synthesizing laughter utterances from waveform silhouettes and a single laughter example

Emotional and controllable speech synthesis is a topic that has received...
research
06/22/2018

Multi-task WaveNet: A Multi-task Generative Model for Statistical Parametric Speech Synthesis without Fundamental Frequency Conditions

This paper introduces an improved generative model for statistical param...

Please sign up or login with your details

Forgot password? Click here to reset