Novel Dual-Channel Long Short-Term Memory Compressed Capsule Networks for Emotion Recognition

12/26/2021
by   Ismail Shahin, et al.
20

Recent analysis on speech emotion recognition has made considerable advances with the use of MFCCs spectrogram features and the implementation of neural network approaches such as convolutional neural networks (CNNs). Capsule networks (CapsNet) have gained gratitude as alternatives to CNNs with their larger capacities for hierarchical representation. To address these issues, this research introduces a text-independent and speaker-independent SER novel architecture, where a dual-channel long short-term memory compressed-CapsNet (DC-LSTM COMP-CapsNet) algorithm is proposed based on the structural features of CapsNet. Our proposed novel classifier can ensure the energy efficiency of the model and adequate compression method in speech emotion recognition, which is not delivered through the original structure of a CapsNet. Moreover, the grid search approach is used to attain optimal solutions. Results witnessed an improved performance and reduction in the training and testing running time. The speech datasets used to evaluate our algorithm are: Arabic Emirati-accented corpus, English speech under simulated and actual stress corpus, English Ryerson audio-visual database of emotional speech and song corpus, and crowd-sourced emotional multimodal actors dataset. This work reveals that the optimum feature extraction method compared to other known methods is MFCCs delta-delta. Using the four datasets and the MFCCs delta-delta, DC-LSTM COMP-CapsNet surpasses all the state-of-the-art systems, classical classifiers, CNN, and the original CapsNet. Using the Arabic Emirati-accented corpus, our results demonstrate that the proposed work yields average emotion recognition accuracy of 89.3 31.9 k-nearest neighbor, radial basis function, and naive Bayes, respectively.

READ FULL TEXT

page 4

page 13

research
12/22/2019

Emotion Recognition from Speech

In this work, we conduct an extensive comparison of various approaches t...
research
08/14/2017

Learning spectro-temporal features with 3D CNNs for speech emotion recognition

In this paper, we propose to use deep 3-dimensional convolutional networ...
research
04/08/2019

Direct Modelling of Speech Emotion from Raw Speech

Speech emotion recognition is a challenging task and heavily depends on ...
research
10/31/2020

Efficient Arabic emotion recognition using deep neural networks

Emotion recognition from speech signal based on deep learning is an acti...
research
02/11/2021

CASA-Based Speaker Identification Using Cascaded GMM-CNN Classifier in Noisy and Emotional Talking Conditions

This work aims at intensifying text-independent speaker identification p...
research
02/20/2020

Audio-video Emotion Recognition in the Wild using Deep Hybrid Networks

This paper presents an audiovisual-based emotion recognition hybrid netw...
research
12/19/2019

LSTM-TDNN with convolutional front-end for Dialect Identification in the 2019 Multi-Genre Broadcast Challenge

This paper presents a novel Dialect Identification (DID) system develope...

Please sign up or login with your details

Forgot password? Click here to reset