DeepAI AI Chat
Log In Sign Up

What Makes Multimodal Learning Better than Single (Provably)

by   Yu Huang, et al.

The world provides us with data of multiple modalities. Intuitively, models fusingdata from different modalities outperform unimodal models, since more informationis aggregated. Recently, joining the success of deep learning, there is an influentialline of work on deep multimodal learning, which has remarkable empirical resultson various applications. However, theoretical justifications in this field are notablylacking.Can multimodal provably perform better than unimodal? In this paper, we answer this question under a most popular multimodal learningframework, which firstly encodes features from different modalities into a commonlatent space and seamlessly maps the latent representations into the task space. Weprove that learning with multiple modalities achieves a smaller population risk thanonly using its subset of modalities. The main intuition is that the former has moreaccurate estimate of the latent space representation. To the best of our knowledge,this is the first theoretical treatment to capture important qualitative phenomenaobserved in real multimodal applications. Combining with experiment results, weshow that multimodal learning does possess an appealing formal guarantee.


page 1

page 2

page 3

page 4


Score-Based Multimodal Autoencoders

Multimodal Variational Autoencoders (VAEs) represent a promising group o...

Multimodal Understanding Through Correlation Maximization and Minimization

Multimodal learning has mainly focused on learning large models on, and ...

On Robustness in Multimodal Learning

Multimodal learning is defined as learning over multiple heterogeneous i...

Factorized Multimodal Transformer for Multimodal Sequential Learning

The complex world around us is inherently multimodal and sequential (con...

Comparing Trajectory and Vision Modalities for Verb Representation

Three-dimensional trajectories, or the 3D position and rotation of objec...

A Review on Methods and Applications in Multimodal Deep Learning

Deep Learning has implemented a wide range of applications and has becom...