Learning spectro-temporal features with 3D CNNs for speech emotion recognition

08/14/2017
by   Jaebok Kim, et al.
0

In this paper, we propose to use deep 3-dimensional convolutional networks (3D CNNs) in order to address the challenge of modelling spectro-temporal dynamics for speech emotion recognition (SER). Compared to a hybrid of Convolutional Neural Network and Long-Short-Term-Memory (CNN-LSTM), our proposed 3D CNNs simultaneously extract short-term and long-term spectral features with a moderate number of parameters. We evaluated our proposed and other state-of-the-art methods in a speaker-independent manner using aggregated corpora that give a large and diverse set of speakers. We found that 1) shallow temporal and moderately deep spectral kernels of a homogeneous architecture are optimal for the task; and 2) our 3D CNNs are more effective for spectro-temporal feature learning compared to other methods. Finally, we visualised the feature space obtained with our proposed method using t-distributed stochastic neighbour embedding (T-SNE) and could observe distinct clusters of emotions.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/22/2019

Emotion Recognition from Speech

In this work, we conduct an extensive comparison of various approaches t...
research
04/08/2019

Direct Modelling of Speech Emotion from Raw Speech

Speech emotion recognition is a challenging task and heavily depends on ...
research
12/26/2021

Novel Dual-Channel Long Short-Term Memory Compressed Capsule Networks for Emotion Recognition

Recent analysis on speech emotion recognition has made considerable adva...
research
10/31/2020

Efficient Arabic emotion recognition using deep neural networks

Emotion recognition from speech signal based on deep learning is an acti...
research
06/14/2023

EMERSK – Explainable Multimodal Emotion Recognition with Situational Knowledge

Automatic emotion recognition has recently gained significant attention ...
research
05/09/2018

Speaker Recognition using Deep Belief Networks

Short time spectral features such as mel frequency cepstral coefficients...

Please sign up or login with your details

Forgot password? Click here to reset