Ensembling Multilingual Pre-Trained Models for Predicting Multi-Label Regression Emotion Share from Speech

09/20/2023
by   Bagus Tris Atmaja, et al.
0

Speech emotion recognition has evolved from research to practical applications. Previous studies of emotion recognition from speech have focused on developing models on certain datasets like IEMOCAP. The lack of data in the domain of emotion modeling emerges as a challenge to evaluate models in the other dataset, as well as to evaluate speech emotion recognition models that work in a multilingual setting. This paper proposes an ensemble learning to fuse results of pre-trained models for emotion share recognition from speech. The models were chosen to accommodate multilingual data from English and Spanish. The results show that ensemble learning can improve the performance of the baseline model with a single model and the previous best model from the late fusion. The performance is measured using the Spearman rank correlation coefficient since the task is a regression problem with ranking values. A Spearman rank correlation coefficient of 0.537 is reported for the test set, while for the development set, the score is 0.524. These scores are higher than the previous study of a fusion method from monolingual data, which achieved scores of 0.476 for the test and 0.470 for the development.

READ FULL TEXT
research
01/30/2021

LSSED: a large-scale dataset and benchmark for speech emotion recognition

Speech emotion recognition is a vital contributor to the next generation...
research
08/28/2023

Effect of Attention and Self-Supervised Speech Embeddings on Non-Semantic Speech Tasks

Human emotion understanding is pivotal in making conversational technolo...
research
07/11/2022

Multi-level Fusion of Wav2vec 2.0 and BERT for Multimodal Emotion Recognition

The research and applications of multimodal emotion recognition have bec...
research
09/07/2023

LanSER: Language-Model Supported Speech Emotion Recognition

Speech emotion recognition (SER) models typically rely on costly human-l...
research
01/05/2021

Fixed-MAML for Few Shot Classification in Multilingual Speech Emotion Recognition

In this paper, we analyze the feasibility of applying few-shot learning ...
research
07/02/2022

Speech Emotion: Investigating Model Representations, Multi-Task Learning and Knowledge Distillation

Estimating dimensional emotions, such as activation, valence and dominan...
research
10/08/2021

Affective Burst Detection from Speech using Kernel-fusion Dilated Convolutional Neural Networks

As speech-interfaces are getting richer and widespread, speech emotion r...

Please sign up or login with your details

Forgot password? Click here to reset