Emotion transplantation through adaptation in HMM-based speech synthesis

01/09/2021
by   Roberto Barra-Chicote, et al.
0

This paper proposes an emotion transplantation method capable of modifying a synthetic speech model through the use of CSMAPLR adaptation in order to incorporate emotional information learned from a different speaker model while maintaining the identity of the original speaker as much as possible. The proposed method relies on learning both emotional and speaker identity information by means of their adaptation function from an average voice model, and combining them into a single cascade transform capable of imbuing the desired emotion into the target speaker. This method is then applied to the task of transplanting four emotions (anger, happiness, sadness and surprise) into 3 male speakers and 3 female speakers and evaluated in a number of perceptual tests. The results of the evaluations show how the perceived naturalness for emotional text significantly favors the use of the proposed transplanted emotional speech synthesis when compared to traditional neutral speech synthesis, evidenced by a big increase in the perceived emotional strength of the synthesized utterances at a slight cost in speech quality. A final evaluation with a robotic laboratory assistant application shows how by using emotional speech we can significantly increase the students’ satisfaction with the dialog system, proving how the proposed emotion transplantation system provides benefits in real applications.

READ FULL TEXT

page 5

page 11

page 16

research
12/07/2021

Multi-speaker Emotional Text-to-speech Synthesizer

We present a methodology to train our multi-speaker emotional text-to-sp...
research
01/09/2021

Analysis of Statistical Parametric and Unit Selection Speech Synthesis Systems Applied to Emotional Speech

We have applied two state-of-the-art speech synthesis techniques (unit s...
research
09/27/2021

Emotional Speech Synthesis for Companion Robot to Imitate Professional Caregiver Speech

When people try to influence others to do something, they subconsciously...
research
05/10/2021

MASS: Multi-task Anthropomorphic Speech Synthesis Framework

Text-to-Speech (TTS) synthesis plays an important role in human-computer...
research
05/27/2019

EG-GAN: Cross-Language Emotion Gain Synthesis based on Cycle-Consistent Adversarial Networks

Despite remarkable contributions from existing emotional speech synthesi...
research
06/11/2020

FastPitch: Parallel Text-to-speech with Pitch Prediction

We present FastPitch, a fully-parallel text-to-speech model based on Fas...
research
11/14/2021

Textless Speech Emotion Conversion using Decomposed and Discrete Representations

Speech emotion conversion is the task of modifying the perceived emotion...

Please sign up or login with your details

Forgot password? Click here to reset