Efficient Multimodal Transformer with Dual-Level Feature Restoration for Robust Multimodal Sentiment Analysis

08/16/2022
by   Licai Sun, et al.
0

With the proliferation of user-generated online videos, Multimodal Sentiment Analysis (MSA) has attracted increasing attention recently. Despite significant progress, there are still two major challenges on the way towards robust MSA: 1) inefficiency when modeling cross-modal interactions in unaligned multimodal data; and 2) vulnerability to random modality feature missing which typically occurs in realistic settings. In this paper, we propose a generic and unified framework to address them, named Efficient Multimodal Transformer with Dual-Level Feature Restoration (EMT-DLFR). Concretely, EMT employs utterance-level representations from each modality as the global multimodal context to interact with local unimodal features and mutually promote each other. It not only avoids the quadratic scaling cost of previous local-local cross-modal interaction methods but also leads to better performance. To improve model robustness in the incomplete modality setting, on the one hand, DLFR performs low-level feature reconstruction to implicitly encourage the model to learn semantic information from incomplete data. On the other hand, it innovatively regards complete and incomplete data as two different views of one sample and utilizes siamese representation learning to explicitly attract their high-level representations. Comprehensive experiments on three popular datasets demonstrate that our method achieves superior performance in both complete and incomplete modality settings.

READ FULL TEXT

page 4

page 13

page 14

page 17

research
10/26/2022

Multimodal Contrastive Learning via Uni-Modal Coding and Cross-Modal Prediction for Multimodal Sentiment Analysis

Multimodal representation learning is a challenging task in which previo...
research
11/10/2021

Which is Making the Contribution: Modulating Unimodal and Cross-modal Dynamics for Multimodal Sentiment Analysis

Multimodal sentiment analysis (MSA) draws increasing attention with the ...
research
06/16/2022

Multi-scale Cooperative Multimodal Transformers for Multimodal Sentiment Analysis in Videos

Multimodal sentiment analysis in videos is a key task in many real-world...
research
01/24/2022

MMLatch: Bottom-up Top-down Fusion for Multimodal Sentiment Analysis

Current deep learning approaches for multimodal fusion rely on bottom-up...
research
09/15/2023

One-stage Modality Distillation for Incomplete Multimodal Learning

Learning based on multimodal data has attracted increasing interest rece...
research
04/12/2022

Are Multimodal Transformers Robust to Missing Modality?

Multimodal data collected from the real world are often imperfect due to...
research
07/20/2023

General Debiasing for Multimodal Sentiment Analysis

Existing work on Multimodal Sentiment Analysis (MSA) utilizes multimodal...

Please sign up or login with your details

Forgot password? Click here to reset