Dynamic Multimodal Fusion

03/31/2022
by   Zihui Xue, et al.
0

Deep multimodal learning has achieved great progress in recent years. However, current fusion approaches are static in nature, i.e., they process and fuse multimodal inputs with identical computation, without accounting for diverse computational demands of different multimodal data. In this work, we propose dynamic multimodal fusion (DynMM), a new approach that adaptively fuses multimodal data and generates data-dependent forward paths during inference. DynMM can reduce redundant computations for "easy" multimodal inputs (that can be predicted correctly using only one modality or simple fusion techniques) and retain representation power for "hard" samples by adopting all modalities and complex fusion operations for prediction. Results on various multimodal tasks demonstrate the efficiency and wide applicability of our approach. For instance, DynMM can reduce the computation cost by 46.5 accuracy loss on CMU-MOSEI sentiment analysis. For RGB-D semantic segmentation on NYU Depth data, DynMM achieves a +0.7 reductions for the depth encoder when compared with strong baselines. We believe this opens a novel direction towards dynamic multimodal network design, with applications to a wide range of multimodal tasks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/13/2019

Variational Fusion for Multimodal Sentiment Analysis

Multimodal fusion is considered a key step in multimodal tasks such as s...
research
11/06/2019

UNO: Uncertainty-aware Noisy-Or Multimodal Fusion for Unanticipated Input Degradation

The fusion of multiple sensor modalities, especially through deep learni...
research
03/12/2020

MVLoc: Multimodal Variational Geometry-Aware Learning for Visual Localization

Recent learning-based research has achieved impressive results in the fi...
research
05/31/2018

Efficient Low-rank Multimodal Fusion with Modality-Specific Factors

Multimodal research is an emerging field of artificial intelligence, and...
research
08/26/2020

Conversations On Multimodal Input Design With Older Adults

Multimodal input systems can help bridge the wide range of physical abil...
research
03/18/2023

Just Noticeable Visual Redundancy Forecasting: A Deep Multimodal-driven Approach

Just noticeable difference (JND) refers to the maximum visual change tha...
research
12/13/2020

MSAF: Multimodal Split Attention Fusion

Multimodal learning mimics the reasoning process of the human multi-sens...

Please sign up or login with your details

Forgot password? Click here to reset