Multi-modal Attention for Speech Emotion Recognition

09/09/2020
by   Zexu Pan, et al.
0

Emotion represents an essential aspect of human speech that is manifested in speech prosody. Speech, visual, and textual cues are complementary in human communication. In this paper, we study a hybrid fusion method, referred to as multi-modal attention network (MMAN) to make use of visual and textual cues in speech emotion recognition. We propose a novel multi-modal attention mechanism, cLSTM-MMA, which facilitates the attention across three modalities and selectively fuse the information. cLSTM-MMA is fused with other uni-modal sub-networks in the late fusion. The experiments show that speech emotion recognition benefits significantly from visual and textual cues, and the proposed cLSTM-MMA alone is as competitive as other fusion methods in terms of accuracy, but with a much more compact network structure. The proposed hybrid network MMAN achieves state-of-the-art performance on IEMOCAP database for emotion recognition.

READ FULL TEXT

Authors

page 1

page 2

page 3

page 4

06/05/2022

M2FNet: Multi-modal Fusion Network for Emotion Recognition in Conversation

Emotion Recognition in Conversations (ERC) is crucial in developing symp...
04/06/2021

Efficient emotion recognition using hyperdimensional computing with combinatorial channel encoding and cellular automata

In this paper, a hardware-optimized approach to emotion recognition base...
11/09/2019

M3ER: Multiplicative Multimodal Emotion Recognition Using Facial, Textual, and Speech Cues

We present M3ER, a learning-based method for emotion recognition from mu...
08/07/2021

HetEmotionNet: Two-Stream Heterogeneous Graph Recurrent Neural Network for Multi-modal Emotion Recognition

The research on human emotion under multimedia stimulation based on phys...
05/24/2005

Multi-Modal Human-Machine Communication for Instructing Robot Grasping Tasks

A major challenge for the realization of intelligent robots is to supply...
08/05/2020

Compact Graph Architecture for Speech Emotion Recognition

We propose a deep graph approach to address the task of speech emotion r...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.