Speech Emotion Recognition with Dual-Sequence LSTM Architecture

10/20/2019
by   Jianyou Wang, et al.
0

Speech Emotion Recognition (SER) has emerged as a critical component of the next generation of human-machine interfacing technologies. In this work, we propose a new dual-level model that combines handcrafted and raw features for audio signals. Each utterance is preprocessed into a handcrafted input and two mel-spectrograms at different time-frequency resolutions. An LSTM processes the handcrafted input, while a novel LSTM architecture, denoted as Dual-Sequence LSTM (DS-LSTM), processes the two mel-spectrograms simultaneously. The outputs are later averaged to produce a final classification of the utterance. Our proposed model achieves, on average, a weighted accuracy of 72.7 unweighted accuracy of 73.3 models — and is comparable with multimodal SER models that leverage textual information.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/06/2019

Learning Alignment for Multimodal Emotion Recognition from Speech

Speech emotion recognition is a challenging problem because human convey...
research
10/10/2018

Multimodal Speech Emotion Recognition Using Audio and Text

Speech emotion recognition is a challenging task, and extensive reliance...
research
04/08/2019

Direct Modelling of Speech Emotion from Raw Speech

Speech emotion recognition is a challenging task and heavily depends on ...
research
06/05/2018

Attention Based Fully Convolutional Network for Speech Emotion Recognition

Speech emotion recognition is a challenging task for three main reasons:...
research
08/28/2023

Video Multimodal Emotion Recognition System for Real World Applications

This paper proposes a system capable of recognizing a speaker's utteranc...
research
10/27/2020

Emotion recognition by fusing time synchronous and time asynchronous representations

In this paper, a novel two-branch neural network model structure is prop...

Please sign up or login with your details

Forgot password? Click here to reset