A Low-rank Matching Attention based Cross-modal Feature Fusion Method for Conversational Emotion Recognition

06/16/2023
by   Yuntao Shou, et al.
0

Conversational emotion recognition (CER) is an important research topic in human-computer interactions. Although deep learning (DL) based CER approaches have achieved excellent performance, existing cross-modal feature fusion methods used in these DL-based approaches either ignore the intra-modal and inter-modal emotional interaction or have high computational complexity. To address these issues, this paper develops a novel cross-modal feature fusion method for the CER task, i.e., the low-rank matching attention method (LMAM). By setting a matching weight and calculating attention scores between modal features row by row, LMAM contains fewer parameters than the self-attention method. We further utilize the low-rank decomposition method on the weight to make the parameter number of LMAM less than one-third of the self-attention. Therefore, LMAM can potentially alleviate the over-fitting issue caused by a large number of parameters. Additionally, by computing and fusing the similarity of intra-modal and inter-modal features, LMAM can also fully exploit the intra-modal contextual information within each modality and the complementary semantic information across modalities (i.e., text, video and audio) simultaneously. Experimental results on some benchmark datasets show that LMAM can be embedded into any existing state-of-the-art DL-based CER methods and help boost their performance in a plug-and-play manner. Also, experimental results verify the superiority of LMAM compared with other popular cross-modal fusion methods. Moreover, LMAM is a general cross-modal fusion method and can thus be applied to other multi-modal recognition tasks, e.g., session recommendation and humour detection.

READ FULL TEXT
research
11/03/2021

A cross-modal fusion network based on self-attention and residual structure for multimodal emotion recognition

The audio-video based multimodal emotion recognition has attracted a lot...
research
12/27/2020

Exploring Emotion Features and Fusion Strategies for Audio-Video Emotion Recognition

The audio-video based emotion recognition aims to classify a given video...
research
07/25/2022

GA2MIF: Graph and Attention based Two-stage Multi-source Information Fusion for Conversational Emotion Detection

Multimodal Emotion Recognition in Conversation (ERC) plays an influentia...
research
11/01/2019

Low-Rank HOCA: Efficient High-Order Cross-Modal Attention for Video Captioning

This paper addresses the challenging task of video captioning which aims...
research
05/25/2023

Context-Aware Attention Layers coupled with Optimal Transport Domain Adaptation methods for recognizing dementia from spontaneous speech

Alzheimer's disease (AD) constitutes a complex neurocognitive disease an...
research
10/15/2020

DialogueTRM: Exploring the Intra- and Inter-Modal Emotional Behaviors in the Conversation

Emotion Recognition in Conversations (ERC) is essential for building emp...
research
06/12/2018

Attentive cross-modal paratope prediction

Antibodies are a critical part of the immune system, having the function...

Please sign up or login with your details

Forgot password? Click here to reset