Decoding Emotions: A comprehensive Multilingual Study of Speech Models for Speech Emotion Recognition

08/17/2023
by   Anant Singh, et al.
0

Recent advancements in transformer-based speech representation models have greatly transformed speech processing. However, there has been limited research conducted on evaluating these models for speech emotion recognition (SER) across multiple languages and examining their internal representations. This article addresses these gaps by presenting a comprehensive benchmark for SER with eight speech representation models and six different languages. We conducted probing experiments to gain insights into inner workings of these models for SER. We find that using features from a single optimal layer of a speech model reduces the error rate by 32% on average across seven datasets when compared to systems where features from all layers of speech models are used. We also achieve state-of-the-art results for German and Persian languages. Our probing results indicate that the middle layers of speech models capture the most important emotional information for speech emotion recognition.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/19/2022

Feature Selection Enhancement and Feature Space Visualization for Speech-Based Emotion Recognition

Robust speech emotion recognition relies on the quality of the speech fe...
research
12/23/2017

Variational Autoencoders for Learning Latent Representations of Speech Emotion

Latent representation of data in unsupervised fashion is a very interest...
research
10/31/2022

Multilingual Speech Emotion Recognition With Multi-Gating Mechanism and Neural Architecture Search

Speech emotion recognition (SER) classifies audio into emotion categorie...
research
12/23/2017

Variational Autoencoders for Learning Latent Representations of Speech Emotion: A Preliminary Study

Learning the latent representation of data in unsupervised fashion is a ...
research
10/07/2021

SERAB: A multi-lingual benchmark for speech emotion recognition

Recent developments in speech emotion recognition (SER) often leverage d...
research
06/12/2023

MFAS: Emotion Recognition through Multiple Perspectives Fusion Architecture Search Emulating Human Cognition

Speech emotion recognition aims to identify and analyze emotional states...
research
01/07/2022

A New Amharic Speech Emotion Dataset and Classification Benchmark

In this paper we present the Amharic Speech Emotion Dataset (ASED), whic...

Please sign up or login with your details

Forgot password? Click here to reset