MES-P: an Emotional Tonal Speech Dataset in Mandarin Chinese with Distal and Proximal Labels

08/30/2018
by   Zhongzhe Xiao, et al.
0

Emotion shapes all aspects of our interpersonal and intellectual experiences. Its automatic analysis has there-fore many applications, e.g., human-machine interface. In this paper, we propose an emotional tonal speech dataset, namely Mandarin Chinese Emotional Speech Dataset - Portrayed (MES-P), with both distal and proximal labels. In contrast with state of the art emotional speech datasets which are only focused on perceived emotions, the proposed MES-P dataset includes not only perceived emotions with their proximal labels but also intended emotions with distal labels, thereby making it possible to study human emotional intelligence, i.e. people emotion expression ability and their skill of understanding emotions, thus explicitly accounting for perception differences between intended and perceived emotions in speech signals and enabling studies of emotional misunderstandings which often occur in real life. Furthermore, the proposed MES-P dataset also captures a main feature of tonal languages, i.e., tonal variations, and provides recorded emotional speech samples whose tonal variations match the tonal distribution in real life Mandarin Chinese. Besides, the proposed MES-P dataset features emotion intensity variations as well, and includes both moderate and intense versions of recordings for joy, anger, and sadness in addition to neutral speech. Ratings of the collected speech samples are made in valence-arousal space through continuous coordinate locations, resulting in an emotional distribution pattern in 2D VA space. The consistency between the speakers' emotional intentions and the listeners' perceptions is also studied using Cohen's Kappa coefficients. Finally, we also carry out extensive experiments using a baseline on MES-P for automatic emotion recognition and compare the results with human emotion intelligence.

READ FULL TEXT

page 1

page 5

page 7

page 8

page 16

page 17

research
06/28/2023

EmoSpeech: Guiding FastSpeech2 Towards Emotional Text to Speech

State-of-the-art speech synthesis models try to get as close as possible...
research
06/23/2015

Detection and Analysis of Emotion From Speech Signals

Recognizing emotion from speech has become one the active research theme...
research
04/15/2022

It is Okay to Not Be Okay: Overcoming Emotional Bias in Affective Image Captioning by Contrastive Data Collection

Datasets that capture the connection between vision, language, and affec...
research
08/30/2018

Contribution of Glottal Waveform in Speech Emotion: A Comparative Pairwise Investigation

In this work, we investigated the contribution of the glottal waveform i...
research
05/10/2022

Bridging the prosody GAP: Genetic Algorithm with People to efficiently sample emotional prosody

The human voice effectively communicates a range of emotions with nuance...
research
09/01/2010

Emotional State Categorization from Speech: Machine vs. Human

This paper presents our investigations on emotional state categorization...
research
12/10/2019

Measuring Mother-Infant Emotions By Audio Sensing

It has been suggested in developmental psychology literature that the co...

Please sign up or login with your details

Forgot password? Click here to reset