DeepAI AI Chat
Log In Sign Up

Speech Emotion Recognition System by Quaternion Nonlinear Echo State Network

by   Fatemeh Daneshfar, et al.

The echo state network (ESN) is a powerful and efficient tool for displaying dynamic data. However, many existing ESNs have limitations for properly modeling high-dimensional data. The most important limitation of these networks is the high memory consumption due to their reservoir structure, which has prevented the increase of reservoir units and the maximum use of special capabilities of this type of network. One way to solve this problem is to use quaternion algebra. Because quaternions have four different dimensions, high-dimensional data are easily represented and, using Hamilton multiplication, with fewer parameters than real numbers, make external relations between the multidimensional features easier. In addition to the memory problem in the ESN network, the linear output of the ESN network poses an indescribable limit to its processing capacity, as it cannot effectively utilize higher-order statistics of features provided by the nonlinear dynamics of reservoir neurons. In this research, a new structure based on ESN is presented, in which quaternion algebra is used to compress the network data with the simple split function, and the output linear combiner is replaced by a multidimensional bilinear filter. This filter will be used for nonlinear calculations of the output layer of the ESN. In addition, the two-dimensional principal component analysis technique is used to reduce the number of data transferred to the bilinear filter. In this study, the coefficients and the weights of the quaternion nonlinear ESN (QNESN) are optimized using the genetic algorithm. In order to prove the effectiveness of the proposed model compared to the previous methods, experiments for speech emotion recognition have been performed on EMODB, SAVEE, and IEMOCAP speech emotional datasets. Comparisons show that the proposed QNESN network performs better than the ESN and most currently SER systems.


page 30

page 36


Speech Emotion Recognition Considering Local Dynamic Features

Recently, increasing attention has been directed to the study of the spe...

Convolutional Attention Networks for Multimodal Emotion Recognition from Speech and Text Data

Emotion recognition has become a popular topic of interest, especially i...

Optimizing Memory in Reservoir Computers

A reservoir computer is a way of using a high dimensional dynamical syst...

Biological neurons act as generalization filters in reservoir computing

Reservoir computing is a machine learning paradigm that transforms the t...

Fixed-MAML for Few Shot Classification in Multilingual Speech Emotion Recognition

In this paper, we analyze the feasibility of applying few-shot learning ...