Effect of different splitting criteria on the performance of speech emotion recognition

10/26/2022
by   Bagus Tris Atmaja, et al.
0

Traditional speech emotion recognition (SER) evaluations have been performed merely on a speaker-independent condition; some of them even did not evaluate their result on this condition. This paper highlights the importance of splitting training and test data for SER by script, known as sentence-open or text-independent criteria. The results show that employing sentence-open criteria degraded the performance of SER. This finding implies the difficulties of recognizing emotion from speech in different linguistic information embedded in acoustic information. Surprisingly, text-independent criteria consistently performed worse than speaker+text-independent criteria. The full order of difficulties for splitting criteria on SER performances from the most difficult to the easiest is text-independent, speaker+text-independent, speaker-independent, and speaker+text-dependent. The gap between speaker+text-independent and text-independent was smaller than other criteria, strengthening the difficulties of recognizing emotion from speech in different sentences.

READ FULL TEXT
research
03/30/2018

Reusing Neural Speech Representations for Auditory Emotion Recognition

Acoustic emotion recognition aims to categorize the affective state of t...
research
02/02/2022

Speaker Normalization for Self-supervised Speech Emotion Recognition

Large speech emotion recognition datasets are hard to obtain, and small ...
research
02/12/2020

x-vectors meet emotions: A study on dependencies between emotion and speaker recognition

In this work, we explore the dependencies between speaker recognition an...
research
06/04/2018

DNN-HMM based Speaker Adaptive Emotion Recognition using Proposed Epoch and MFCC Features

Speech is produced when time varying vocal tract system is excited with ...
research
11/04/2019

Speaker-invariant Affective Representation Learning via Adversarial Training

Representation learning for speech emotion recognition is challenging du...
research
01/19/2022

Unsupervised Personalization of an Emotion Recognition System: The Unique Properties of the Externalization of Valence in Speech

The prediction of valence from speech is an important, but challenging p...
research
06/18/2021

Fusion of Embeddings Networks for Robust Combination of Text Dependent and Independent Speaker Recognition

By implicitly recognizing a user based on his/her speech input, speaker ...

Please sign up or login with your details

Forgot password? Click here to reset