Evaluating raw waveforms with deep learning frameworks for speech emotion recognition

07/06/2023
by   Zeynep Hilal Kilimci, et al.
0

Speech emotion recognition is a challenging task in speech processing field. For this reason, feature extraction process has a crucial importance to demonstrate and process the speech signals. In this work, we represent a model, which feeds raw audio files directly into the deep neural networks without any feature extraction stage for the recognition of emotions utilizing six different data sets, EMO-DB, RAVDESS, TESS, CREMA, SAVEE, and TESS+RAVDESS. To demonstrate the contribution of proposed model, the performance of traditional feature extraction techniques namely, mel-scale spectogram, mel-frequency cepstral coefficients, are blended with machine learning algorithms, ensemble learning methods, deep and hybrid deep learning techniques. Support vector machine, decision tree, naive Bayes, random forests models are evaluated as machine learning algorithms while majority voting and stacking methods are assessed as ensemble learning techniques. Moreover, convolutional neural networks, long short-term memory networks, and hybrid CNN- LSTM model are evaluated as deep learning techniques and compared with machine learning and ensemble learning methods. To demonstrate the effectiveness of proposed model, the comparison with state-of-the-art studies are carried out. Based on the experiment results, CNN model excels existent approaches with 95.86 accuracy for TESS+RAVDESS data set using raw audio files, thence determining the new state-of-the-art. The proposed model performs 90.34 EMO-DB with CNN model, 90.42 accuracy for TESS with LSTM model, 69.72 85.76 categorization problems.

READ FULL TEXT

page 4

page 5

page 8

page 9

page 11

research
12/22/2019

Emotion Recognition from Speech

In this work, we conduct an extensive comparison of various approaches t...
research
07/11/2018

Emotion Recognition from Speech based on Relevant Feature and Majority Voting

This paper proposes an approach to detect emotion from human speech empl...
research
11/21/2021

ARMAS: Active Reconstruction of Missing Audio Segments

Digital audio signal reconstruction of lost or corrupt segment using dee...
research
12/10/2021

An Ensemble 1D-CNN-LSTM-GRU Model with Data Augmentation for Speech Emotion Recognition

In this paper, we propose an ensemble of deep neural networks along with...
research
01/03/2023

An ensemble-based framework for mispronunciation detection of Arabic phonemes

Determination of mispronunciations and ensuring feedback to users are ma...
research
05/06/2019

RSL19BD at DBDC4: Ensemble of Decision Tree-based and LSTM-based Models

RSL19BD (Waseda University Sakai Laboratory) participated in the Fourth ...
research
04/03/2017

Spatiotemporal Networks for Video Emotion Recognition

Our experiment adapts several popular deep learning methods as well as s...

Please sign up or login with your details

Forgot password? Click here to reset