Efficient Speech Emotion Recognition Using Multi-Scale CNN and Attention

06/08/2021
by   Zixuan Peng, et al.
0

Emotion recognition from speech is a challenging task. Re-cent advances in deep learning have led bi-directional recur-rent neural network (Bi-RNN) and attention mechanism as astandard method for speech emotion recognition, extractingand attending multi-modal features - audio and text, and thenfusing them for downstream emotion classification tasks. Inthis paper, we propose a simple yet efficient neural networkarchitecture to exploit both acoustic and lexical informationfrom speech. The proposed framework using multi-scale con-volutional layers (MSCNN) to obtain both audio and text hid-den representations. Then, a statistical pooling unit (SPU)is used to further extract the features in each modality. Be-sides, an attention module can be built on top of the MSCNN-SPU (audio) and MSCNN (text) to further improve the perfor-mance. Extensive experiments show that the proposed modeloutperforms previous state-of-the-art methods on IEMOCAPdataset with four emotion categories (i.e., angry, happy, sadand neutral) in both weighted accuracy (WA) and unweightedaccuracy (UA), with an improvement of 5.0 under the ASR setting.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/14/2023

HCAM – Hierarchical Cross Attention Model for Multi-modal Emotion Recognition

Emotion recognition in conversations is challenging due to the multi-mod...
research
09/23/2020

Attention Driven Fusion for Multi-Modal Emotion Recognition

Deep learning has emerged as a powerful alternative to hand-crafted meth...
research
10/10/2018

Multimodal Speech Emotion Recognition Using Audio and Text

Speech emotion recognition is a challenging task, and extensive reliance...
research
04/23/2019

Speech Emotion Recognition Using Multi-Hop Attention Mechanism

In this paper, we are interested in exploiting textual and acoustic data...
research
11/29/2018

Two-level Attention with Two-stage Multi-task Learning for Facial Emotion Recognition

Compared with facial emotion recognition on categorical model, the dimen...
research
10/31/2022

Multilingual Speech Emotion Recognition With Multi-Gating Mechanism and Neural Architecture Search

Speech emotion recognition (SER) classifies audio into emotion categorie...
research
09/10/2020

Multi-modal embeddings using multi-task learning for emotion recognition

General embeddings like word2vec, GloVe and ELMo have shown a lot of suc...

Please sign up or login with your details

Forgot password? Click here to reset