How Speech is Recognized to Be Emotional - A Study Based on Information Decomposition

11/24/2021
by   Haoran Sun, et al.
0

The way that humans encode their emotion into speech signals is complex. For instance, an angry man may increase his pitch and speaking rate, and use impolite words. In this paper, we present a preliminary study on various emotional factors and investigate how each of them impacts modern emotion recognition systems. The key tool of our study is the SpeechFlow model presented recently, by which we are able to decompose speech signals into separate information factors (content, pitch, rhythm). Based on this decomposition, we carefully studied the performance of each information component and their combinations. We conducted the study on three different speech emotion corpora and chose an attention-based convolutional RNN as the emotion classifier. Our results show that rhythm is the most important component for emotional expression. Moreover, the cross-corpus results are very bad (even worse than guess), demonstrating that the present speech emotion recognition model is rather weak. Interestingly, by removing one or several unimportant components, the cross-corpus results can be improved. This demonstrates the potential of the decomposition approach towards a generalizable emotion recognition.

READ FULL TEXT
research
09/27/2017

Research on several key technologies in practical speech emotion recognition

In this dissertation the practical speech emotion recognition technology...
research
09/25/2017

Towards Indonesian Speech-Emotion Automatic Recognition (I-SpEAR)

Even though speech-emotion recognition (SER) has been receiving much att...
research
06/30/2023

Empirical Interpretation of the Relationship Between Speech Acoustic Context and Emotion Recognition

Speech emotion recognition (SER) is vital for obtaining emotional intell...
research
11/14/2022

Temporal Modeling Matters: A Novel Temporal Emotional Modeling Approach for Speech Emotion Recognition

Speech emotion recognition (SER) plays a vital role in improving the int...
research
04/25/2022

Real-time Speech Emotion Recognition Based on Syllable-Level Feature Extraction

Speech emotion recognition systems have high prediction latency because ...
research
08/15/2020

Advancing Multiple Instance Learning with Attention Modeling for Categorical Speech Emotion Recognition

Categorical speech emotion recognition is typically performed as a seque...
research
10/14/2022

Training speech emotion classifier without categorical annotations

There are two paradigms of emotion representation, categorical labeling ...

Please sign up or login with your details

Forgot password? Click here to reset