ScaleVLAD: Improving Multimodal Sentiment Analysis via Multi-Scale Fusion of Locally Descriptors

12/02/2021
by   Huaishao Luo, et al.
0

Fusion technique is a key research topic in multimodal sentiment analysis. The recent attention-based fusion demonstrates advances over simple operation-based fusion. However, these fusion works adopt single-scale, i.e., token-level or utterance-level, unimodal representation. Such single-scale fusion is suboptimal because that different modality should be aligned with different granularities. This paper proposes a fusion model named ScaleVLAD to gather multi-Scale representation from text, video, and audio with shared Vectors of Locally Aggregated Descriptors to improve unaligned multimodal sentiment analysis. These shared vectors can be regarded as shared topics to align different modalities. In addition, we propose a self-supervised shifted clustering loss to keep the fused feature differentiation among samples. The backbones are three Transformer encoders corresponding to three modalities, and the aggregated features generated from the fusion module are feed to a Transformer plus a full connection to finish task predictions. Experiments on three popular sentiment analysis benchmarks, IEMOCAP, MOSI, and MOSEI, demonstrate significant gains over baselines.

READ FULL TEXT

page 2

page 3

page 5

page 6

page 7

page 8

page 9

page 11

research
09/07/2020

TransModality: An End2End Fusion Method with Transformer for Multimodal Sentiment Analysis

Multimodal sentiment analysis is an important research area that predict...
research
06/16/2022

Multi-scale Cooperative Multimodal Transformers for Multimodal Sentiment Analysis in Videos

Multimodal sentiment analysis in videos is a key task in many real-world...
research
05/15/2023

Shared and Private Information Learning in Multimodal Sentiment Analysis with Deep Modal Alignment and Self-supervised Multi-Task Learning

Designing an effective representation learning method for multimodal sen...
research
04/28/2022

Tag-assisted Multimodal Sentiment Analysis under Uncertain Missing Modalities

Multimodal sentiment analysis has been studied under the assumption that...
research
11/12/2022

A Self-Adjusting Fusion Representation Learning Model for Unaligned Text-Audio Sequences

Inter-modal interaction plays an indispensable role in multimodal sentim...
research
10/13/2018

Multi-scale Geometric Summaries for Similarity-based Sensor Fusion

In this work, we address fusion of heterogeneous sensor data using wavel...
research
04/12/2022

CLMLF:A Contrastive Learning and Multi-Layer Fusion Method for Multimodal Sentiment Detection

Compared with unimodal data, multimodal data can provide more features t...

Please sign up or login with your details

Forgot password? Click here to reset