HCAM – Hierarchical Cross Attention Model for Multi-modal Emotion Recognition

04/14/2023
by   Soumya Dutta, et al.
0

Emotion recognition in conversations is challenging due to the multi-modal nature of the emotion expression. We propose a hierarchical cross-attention model (HCAM) approach to multi-modal emotion recognition using a combination of recurrent and co-attention neural network models. The input to the model consists of two modalities, i) audio data, processed through a learnable wav2vec approach and, ii) text data represented using a bidirectional encoder representations from transformers (BERT) model. The audio and text representations are processed using a set of bi-directional recurrent neural network layers with self-attention that converts each utterance in a given conversation to a fixed dimensional embedding. In order to incorporate contextual knowledge and the information across the two modalities, the audio and text embeddings are combined using a co-attention layer that attempts to weigh the utterance level embeddings relevant to the task of emotion recognition. The neural network parameters in the audio layers, text layers as well as the multi-modal co-attention layers, are hierarchically trained for the emotion classification task. We perform experiments on three established datasets namely, IEMOCAP, MELD and CMU-MOSI, where we illustrate that the proposed model improves significantly over other benchmarks and helps achieve state-of-art results on all these datasets.

READ FULL TEXT

page 1

page 3

page 8

research
06/05/2022

M2FNet: Multi-modal Fusion Network for Emotion Recognition in Conversation

Emotion Recognition in Conversations (ERC) is crucial in developing symp...
research
02/18/2022

Is Cross-Attention Preferable to Self-Attention for Multi-Modal Emotion Recognition?

Humans express their emotions via facial expressions, voice intonation a...
research
06/08/2021

Efficient Speech Emotion Recognition Using Multi-Scale CNN and Attention

Emotion recognition from speech is a challenging task. Re-cent advances ...
research
09/23/2020

Attention Driven Fusion for Multi-Modal Emotion Recognition

Deep learning has emerged as a powerful alternative to hand-crafted meth...
research
02/18/2020

Hierarchical Transformer Network for Utterance-level Emotion Recognition

While there have been significant advances in de-tecting emotions in tex...
research
11/13/2020

Multi-Modal Emotion Detection with Transfer Learning

Automated emotion detection in speech is a challenging task due to the c...
research
05/02/2018

OMG Emotion Challenge - ExCouple Team

The proposed model is only for the audio module. All videos in the OMG E...

Please sign up or login with your details

Forgot password? Click here to reset