Learning Speech Emotion Representations in the Quaternion Domain

04/05/2022
by   Eric Guizzo, et al.
7

The modeling of human emotion expression in speech signals is an important, yet challenging task. The high resource demand of speech emotion recognition models, combined with the the general scarcity of emotion-labelled data are obstacles to the development and application of effective solutions in this field. In this paper, we present an approach to jointly circumvent these difficulties. Our method, named RH-emo, is a novel semi-supervised architecture aimed at extracting quaternion embeddings from real-valued monoaural spectrograms, enabling the use of quaternion-valued networks for speech emotion recognition tasks. RH-emo is a hybrid real/quaternion autoencoder network that consists of a real-valued encoder in parallel to a real-valued emotion classifier and a quaternion-valued decoder. On the one hand, the classifier permits to optimize each latent axis of the embeddings for the classification of a specific emotion-related characteristic: valence, arousal, dominance and overall emotion. On the other hand, the quaternion reconstruction enables the latent dimension to develop intra-channel correlations that are required for an effective representation as a quaternion entity. We test our approach on speech emotion recognition tasks using four popular datasets: Iemocap, Ravdess, EmoDb and Tess, comparing the performance of three well-established real-valued CNN architectures (AlexNet, ResNet-50, VGG) and their quaternion-valued equivalent fed with the embeddings created with RH-emo. We obtain a consistent improvement in the test accuracy for all datasets, while drastically reducing the resources' demand of models. Moreover, we performed additional experiments and ablation studies that confirm the effectiveness of our approach. The RH-emo repository is available at: https://github.com/ispamm/rhemo.

READ FULL TEXT

page 1

page 10

research
08/16/2018

Emotion Recognition in Speech using Cross-Modal Transfer in the Wild

Obtaining large, human labelled speech datasets to train models for emot...
research
12/23/2017

Variational Autoencoders for Learning Latent Representations of Speech Emotion

Latent representation of data in unsupervised fashion is a very interest...
research
10/29/2022

Unifying the Discrete and Continuous Emotion labels for Speech Emotion Recognition

Traditionally, in paralinguistic analysis for emotion detection from spe...
research
12/23/2017

Variational Autoencoders for Learning Latent Representations of Speech Emotion: A Preliminary Study

Learning the latent representation of data in unsupervised fashion is a ...
research
01/30/2021

LSSED: a large-scale dataset and benchmark for speech emotion recognition

Speech emotion recognition is a vital contributor to the next generation...
research
05/18/2023

TrustSER: On the Trustworthiness of Fine-tuning Pre-trained Speech Embeddings For Speech Emotion Recognition

Recent studies have explored the use of pre-trained embeddings for speec...
research
12/04/1998

Name Strategy: Its Existence and Implications

It is argued that colour name strategy, object name strategy, and chunki...

Please sign up or login with your details

Forgot password? Click here to reset