DM^2S^2: Deep Multi-Modal Sequence Sets with Hierarchical Modality Attention

09/07/2022
by   Shunsuke Kitada, et al.
0

There is increasing interest in the use of multimodal data in various web applications, such as digital advertising and e-commerce. Typical methods for extracting important information from multimodal data rely on a mid-fusion architecture that combines the feature representations from multiple encoders. However, as the number of modalities increases, several potential problems with the mid-fusion model structure arise, such as an increase in the dimensionality of the concatenated multimodal features and missing modalities. To address these problems, we propose a new concept that considers multimodal inputs as a set of sequences, namely, deep multimodal sequence sets (DM^2S^2). Our set-aware concept consists of three components that capture the relationships among multiple modalities: (a) a BERT-based encoder to handle the inter- and intra-order of elements in the sequences, (b) intra-modality residual attention (IntraMRA) to capture the importance of the elements in a modality, and (c) inter-modality residual attention (InterMRA) to enhance the importance of elements with modality-level granularity further. Our concept exhibits performance that is comparable to or better than the previous set-aware models. Furthermore, we demonstrate that the visualization of the learned InterMRA and IntraMRA weights can provide an interpretation of the prediction results.

READ FULL TEXT

page 3

page 7

research
10/16/2020

Deep-HOSeq: Deep Higher Order Sequence Fusion for Multimodal Sentiment Analysis

Multimodal sentiment analysis utilizes multiple heterogeneous modalities...
research
06/29/2023

Deep Equilibrium Multimodal Fusion

Multimodal fusion integrates the complementary information present in mu...
research
11/19/2019

Modal-aware Features for Multimodal Hashing

Many retrieval applications can benefit from multiple modalities, e.g., ...
research
11/26/2021

Neural Collaborative Graph Machines for Table Structure Recognition

Recently, table structure recognition has achieved impressive progress w...
research
04/02/2023

Multimodal Hyperspectral Image Classification via Interconnected Fusion

Existing multiple modality fusion methods, such as concatenation, summat...
research
11/13/2015

Symbol Grounding Association in Multimodal Sequences with Missing Elements

In this paper, we extend a symbolic association framework for being able...
research
05/23/2023

Cross-Attention is Not Enough: Incongruity-Aware Hierarchical Multimodal Sentiment Analysis and Emotion Recognition

Fusing multiple modalities for affective computing tasks has proven effe...

Please sign up or login with your details

Forgot password? Click here to reset