Speech Emotion Recognition Using Quaternion Convolutional Neural Networks

10/31/2021
by   Aneesh Muppidi, et al.
0

Although speech recognition has become a widespread technology, inferring emotion from speech signals still remains a challenge. To address this problem, this paper proposes a quaternion convolutional neural network (QCNN) based speech emotion recognition (SER) model in which Mel-spectrogram features of speech signals are encoded in an RGB quaternion domain. We show that our QCNN based SER model outperforms other real-valued methods in the Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS, 8-classes) dataset, achieving, to the best of our knowledge, state-of-the-art results. The QCNN also achieves comparable results with the state-of-the-art methods in the Interactive Emotional Dyadic Motion Capture (IEMOCAP 4-classes) and Berlin EMO-DB (7-classes) datasets. Specifically, the model achieves an accuracy of 77.87%, 70.46%, and 88.78% for the RAVDESS, IEMOCAP, and EMO-DB datasets, respectively. In addition, our results show that the quaternion unit structure is better able to encode internal dependencies to reduce its model size significantly compared to other methods.

READ FULL TEXT
research
06/23/2018

Evaluating Gammatone Frequency Cepstral Coefficients with Neural Networks for Emotion Recognition from Speech

Current approaches to speech emotion recognition focus on speech feature...
research
09/15/2021

FSER: Deep Convolutional Neural Networks for Speech Emotion Recognition

Using mel-spectrograms over conventional MFCCs features, we assess the a...
research
03/09/2023

hierarchical network with decoupled knowledge distillation for speech emotion recognition

The goal of Speech Emotion Recognition (SER) is to enable computers to r...
research
10/08/2021

Affective Burst Detection from Speech using Kernel-fusion Dilated Convolutional Neural Networks

As speech-interfaces are getting richer and widespread, speech emotion r...
research
02/03/2021

Speech Emotion Recognition with Multiscale Area Attention and Data Augmentation

In Speech Emotion Recognition (SER), emotional characteristics often app...
research
03/28/2022

Continuous Metric Learning For Transferable Speech Emotion Recognition and Embedding Across Low-resource Languages

Speech emotion recognition (SER) refers to the technique of inferring th...

Please sign up or login with your details

Forgot password? Click here to reset