Bridging the Emotional Semantic Gap via Multimodal Relevance Estimation

02/03/2023
by   Chuan Zhang, et al.
0

Human beings have rich ways of emotional expressions, including facial action, voice, and natural languages. Due to the diversity and complexity of different individuals, the emotions expressed by various modalities may be semantically irrelevant. Directly fusing information from different modalities may inevitably make the model subject to the noise from semantically irrelevant modalities. To tackle this problem, we propose a multimodal relevance estimation network to capture the relevant semantics among modalities in multimodal emotions. Specifically, we take advantage of an attention mechanism to reflect the semantic relevance weights of each modality. Moreover, we propose a relevant semantic estimation loss to weakly supervise the semantics of each modality. Furthermore, we make use of contrastive learning to optimize the similarity of category-level modality-relevant semantics across different modalities in feature space, thereby bridging the semantic gap between heterogeneous modalities. In order to better reflect the emotional state in the real interactive scenarios and perform the semantic relevance analysis, we collect a single-label discrete multimodal emotion dataset named SDME, which enables researchers to conduct multimodal semantic relevance research with large category bias. Experiments on continuous and discrete emotion datasets show that our model can effectively capture the relevant semantics, especially for the large deviations in modal semantics. The code and SDME dataset will be publicly available.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/20/2022

InterMulti:Multi-view Multimodal Interactions with Text-dominated Hierarchical High-order Fusion for Emotion Analysis

Humans are sophisticated at reading interlocutors' emotions from multimo...
research
03/06/2022

Learnable Irrelevant Modality Dropout for Multimodal Action Recognition on Modality-Specific Annotated Videos

With the assumption that a video dataset is multimodality annotated in w...
research
12/16/2022

EffMulti: Efficiently Modeling Complex Multimodal Interactions for Emotion Analysis

Humans are skilled in reading the interlocutor's emotion from multimodal...
research
07/28/2021

Bi-Bimodal Modality Fusion for Correlation-Controlled Multimodal Sentiment Analysis

Multimodal sentiment analysis aims to extract and integrate semantic inf...
research
03/10/2022

BEAT: A Large-Scale Semantic and Emotional Multi-Modal Dataset for Conversational Gestures Synthesis

Achieving realistic, vivid, and human-like synthesized conversational ge...
research
08/15/2023

Emotion Embeddings x2014 Learning Stable and Homogeneous Abstractions from Heterogeneous Affective Datasets

Human emotion is expressed in many communication modalities and media fo...
research
08/31/2020

Toward Multimodal Modeling of Emotional Expressiveness

Emotional expressiveness captures the extent to which a person tends to ...

Please sign up or login with your details

Forgot password? Click here to reset