Log In Sign Up

Speech Emotion Recognition Using Deep Sparse Auto-Encoder Extreme Learning Machine with a New Weighting Scheme and Spectro-Temporal Features Along with Classical Feature Select

by   Fatemeh Daneshfar, et al.

Affective computing is very important in the relationship between man and machine. In this paper, a system for speech emotion recognition (SER) based on speech signal is proposed, which uses new techniques in different stages of processing. The system consists of three stages: feature extraction, feature selection, and finally feature classification. In the first stage, a complex set of long-term statistics features is extracted from both the speech signal and the glottal-waveform signal using a combination of new and diverse features such as prosodic, spectral, and spectro-temporal features. One of the challenges of the SER systems is to distinguish correlated emotions. These features are good discriminators for speech emotions and increase the SER's ability to recognize similar and different emotions. This feature vector with a large number of dimensions naturally has redundancy. In the second stage, using classical feature selection techniques as well as a new quantum-inspired technique to reduce the feature vector dimensionality, the number of feature vector dimensions is reduced. In the third stage, the optimized feature vector is classified by a weighted deep sparse extreme learning machine (ELM) classifier. The classifier performs classification in three steps: sparse random feature learning, orthogonal random projection using the singular value decomposition (SVD) technique, and discriminative classification in the last step using the generalized Tikhonov regularization technique. Also, many existing emotional datasets suffer from the problem of data imbalanced distribution, which in turn increases the classification error and decreases system performance. In this paper, a new weighting method has also been proposed to deal with class imbalance, which is more efficient than existing weighting methods. The proposed method is evaluated on three standard emotional databases.


page 7

page 8

page 21

page 27


Optimizing Speech Emotion Recognition using Manta-Ray Based Feature Selection

Emotion recognition from audio signals has been regarded as a challengin...

A Study of Language and Classifier-independent Feature Analysis for Vocal Emotion Recognition

Every speech signal carries implicit information about the emotions, whi...

Biologically inspired speech emotion recognition

Conventional feature-based classification methods do not apply well to a...

Study on Feature Subspace of Archetypal Emotions for Speech Emotion Recognition

Feature subspace selection is an important part in speech emotion recogn...

Emotion Recognition in Low-Resource Settings: An Evaluation of Automatic Feature Selection Methods

Research in automatic emotion recognition has seldom addressed the issue...

Learning spectro-temporal features with 3D CNNs for speech emotion recognition

In this paper, we propose to use deep 3-dimensional convolutional networ...

Deep Residual Local Feature Learning for Speech Emotion Recognition

Speech Emotion Recognition (SER) is becoming a key role in global busine...