Provable Dynamic Fusion for Low-Quality Multimodal Data

06/03/2023
by   Qingyang Zhang, et al.
0

The inherent challenge of multimodal fusion is to precisely capture the cross-modal correlation and flexibly conduct cross-modal interaction. To fully release the value of each modality and mitigate the influence of low-quality multimodal data, dynamic multimodal fusion emerges as a promising learning paradigm. Despite its widespread use, theoretical justifications in this field are still notably lacking. Can we design a provably robust multimodal fusion method? This paper provides theoretical understandings to answer this question under a most popular multimodal fusion framework from the generalization perspective. We proceed to reveal that several uncertainty estimation solutions are naturally available to achieve robust multimodal fusion. Then a novel multimodal fusion framework termed Quality-aware Multimodal Fusion (QMF) is proposed, which can improve the performance in terms of classification accuracy and model robustness. Extensive experimental results on multiple benchmarks can support our findings.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/19/2019

EmbraceNet: A robust deep learning architecture for multimodal classification

Classification using multimodal data arises in many machine learning app...
research
05/17/2023

Object Segmentation by Mining Cross-Modal Semantics

Multi-sensor clues have shown promise for object segmentation, but inher...
research
10/31/2022

Multimodal Information Bottleneck: Learning Minimal Sufficient Unimodal and Multimodal Representations

Learning effective joint embedding for cross-modal data has always been ...
research
03/20/2021

A novel multimodal fusion network based on a joint coding model for lane line segmentation

There has recently been growing interest in utilizing multimodal sensors...
research
11/04/2022

Robustness of Fusion-based Multimodal Classifiers to Cross-Modal Content Dilutions

As multimodal learning finds applications in a wide variety of high-stak...
research
11/16/2022

A Unified Multimodal De- and Re-coupling Framework for RGB-D Motion Recognition

Motion recognition is a promising direction in computer vision, but the ...
research
08/11/2021

Abstractive Sentence Summarization with Guidance of Selective Multimodal Reference

Multimodal abstractive summarization with sentence output is to generate...

Please sign up or login with your details

Forgot password? Click here to reset