Multi-modal Attention for Speech Emotion Recognition

09/09/2020
by   Zexu Pan, et al.
0

Emotion represents an essential aspect of human speech that is manifested in speech prosody. Speech, visual, and textual cues are complementary in human communication. In this paper, we study a hybrid fusion method, referred to as multi-modal attention network (MMAN) to make use of visual and textual cues in speech emotion recognition. We propose a novel multi-modal attention mechanism, cLSTM-MMA, which facilitates the attention across three modalities and selectively fuse the information. cLSTM-MMA is fused with other uni-modal sub-networks in the late fusion. The experiments show that speech emotion recognition benefits significantly from visual and textual cues, and the proposed cLSTM-MMA alone is as competitive as other fusion methods in terms of accuracy, but with a much more compact network structure. The proposed hybrid network MMAN achieves state-of-the-art performance on IEMOCAP database for emotion recognition.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/05/2022

M2FNet: Multi-modal Fusion Network for Emotion Recognition in Conversation

Emotion Recognition in Conversations (ERC) is crucial in developing symp...
research
09/20/2022

An Efficient End-to-End Transformer with Progressive Tri-modal Attention for Multi-modal Emotion Recognition

Recent works on multi-modal emotion recognition move towards end-to-end ...
research
04/06/2021

Efficient emotion recognition using hyperdimensional computing with combinatorial channel encoding and cellular automata

In this paper, a hardware-optimized approach to emotion recognition base...
research
06/02/2023

Multi-Modal Emotion Recognition for Enhanced Requirements Engineering: A Novel Approach

Requirements engineering (RE) plays a crucial role in developing softwar...
research
11/09/2019

M3ER: Multiplicative Multimodal Emotion Recognition Using Facial, Textual, and Speech Cues

We present M3ER, a learning-based method for emotion recognition from mu...
research
08/07/2021

HetEmotionNet: Two-Stream Heterogeneous Graph Recurrent Neural Network for Multi-modal Emotion Recognition

The research on human emotion under multimedia stimulation based on phys...
research
07/13/2022

Multi-modal Depression Estimation based on Sub-attentional Fusion

Failure to timely diagnose and effectively treat depression leads to ove...

Please sign up or login with your details

Forgot password? Click here to reset