Analysis of Statistical Parametric and Unit Selection Speech Synthesis Systems Applied to Emotional Speech

01/09/2021
by   Roberto Barra-Chicote, et al.
0

We have applied two state-of-the-art speech synthesis techniques (unit selection and HMM-based synthesis) to the synthesis of emotional speech. A series of carefully designed perceptual tests to evaluate speech quality, emotion identification rates and emotional strength were used for the six emotions which we recorded –happiness,sadness,anger,surprise,fear,disgust. For the HMM-based method, we evaluated spectral and source components separately and identified which components contribute to which emotion.Our analysis shows that, although the HMM method produces significantly better neutral speech, the two methods produce emotional speech of similar quality, except for emotions having context-dependent prosodic patterns. Whilst synthetic speech produced using the unit selection method has better emotional strength scores than the HMM-based method, the HMM-based method has the ability to manipulate the emotional strength. For emotions that are characterized by both spectral and prosodic components, synthetic speech using unit selection methods was more accurately identified by listeners. For emotions mainly characterized by prosodic components, HMM-based synthetic speech wasmore accurately identified. This finding differs from previous results regarding listener judgements of speaker similarity for neutral speech. We conclude that unit selection methods require improvements to prosodic modeling and that HMM-based methods require improvements to spectral modeling for emotional speech. Certain emotions cannot be reproduced well by either method.

READ FULL TEXT

page 2

page 8

research
01/09/2021

Emotion transplantation through adaptation in HMM-based speech synthesis

This paper proposes an emotion transplantation method capable of modifyi...
research
06/30/2022

Language Model-Based Emotion Prediction Methods for Emotional Speech Synthesis Systems

This paper proposes an effective emotional text-to-speech (TTS) system w...
research
08/16/2023

AffectEcho: Speaker Independent and Language-Agnostic Emotion and Affect Transfer for Speech Synthesis

Affect is an emotional characteristic encompassing valence, arousal, and...
research
10/05/2016

Divide-and-Conquer based Ensemble to Spot Emotions in Speech using MFCC and Random Forest

Besides spoken words, speech signals also carry information about speake...
research
09/14/2023

Analysis of Speech Separation Performance Degradation on Emotional Speech Mixtures

Despite recent strides made in Speech Separation, most models are traine...
research
11/15/2021

Biologically inspired speech emotion recognition

Conventional feature-based classification methods do not apply well to a...
research
05/22/2023

EMNS /Imz/ Corpus: An emotive single-speaker dataset for narrative storytelling in games, television and graphic novels

The increasing adoption of text-to-speech technologies has led to a grow...

Please sign up or login with your details

Forgot password? Click here to reset