What Makes Multimodal Learning Better than Single (Provably)

06/08/2021
by   Yu Huang, et al.
4

The world provides us with data of multiple modalities. Intuitively, models fusingdata from different modalities outperform unimodal models, since more informationis aggregated. Recently, joining the success of deep learning, there is an influentialline of work on deep multimodal learning, which has remarkable empirical resultson various applications. However, theoretical justifications in this field are notablylacking.Can multimodal provably perform better than unimodal? In this paper, we answer this question under a most popular multimodal learningframework, which firstly encodes features from different modalities into a commonlatent space and seamlessly maps the latent representations into the task space. Weprove that learning with multiple modalities achieves a smaller population risk thanonly using its subset of modalities. The main intuition is that the former has moreaccurate estimate of the latent space representation. To the best of our knowledge,this is the first theoretical treatment to capture important qualitative phenomenaobserved in real multimodal applications. Combining with experiment results, weshow that multimodal learning does possess an appealing formal guarantee.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/25/2023

Score-Based Multimodal Autoencoders

Multimodal Variational Autoencoders (VAEs) represent a promising group o...
research
05/04/2023

Multimodal Understanding Through Correlation Maximization and Minimization

Multimodal learning has mainly focused on learning large models on, and ...
research
07/18/2023

Multimodal LLMs for health grounded in individual-specific data

Foundation large language models (LLMs) have shown an impressive ability...
research
04/10/2023

On Robustness in Multimodal Learning

Multimodal learning is defined as learning over multiple heterogeneous i...
research
05/29/2018

Learn to Combine Modalities in Multimodal Deep Learning

Combining complementary information from multiple modalities is intuitiv...
research
02/18/2022

A Review on Methods and Applications in Multimodal Deep Learning

Deep Learning has implemented a wide range of applications and has becom...

Please sign up or login with your details

Forgot password? Click here to reset