SMIL: Multimodal Learning with Severely Missing Modality

by   Mengmeng Ma, et al.

A common assumption in multimodal learning is the completeness of training data, i.e., full modalities are available in all training examples. Although there exists research endeavor in developing novel methods to tackle the incompleteness of testing data, e.g., modalities are partially missing in testing examples, few of them can handle incomplete training modalities. The problem becomes even more challenging if considering the case of severely missing, e.g., 90 first time in the literature, this paper formally studies multimodal learning with missing modality in terms of flexibility (missing modalities in training, testing, or both) and efficiency (most training data have incomplete modality). Technically, we propose a new method named SMIL that leverages Bayesian meta-learning in uniformly achieving both objectives. To validate our idea, we conduct a series of experiments on three popular benchmarks: MM-IMDb, CMU-MOSI, and avMNIST. The results prove the state-of-the-art performance of SMIL over existing methods and generative baselines including autoencoders and generative adversarial networks. Our code is available at


page 1

page 2

page 3

page 4


LRMM: Learning to Recommend with Missing Modalities

Multimodal learning has shown promising performance in content-based rec...

Overcoming Missing and Incomplete Modalities with Generative Adversarial Networks for Building Footprint Segmentation

The integration of information acquired with different modalities, spati...

Multimodal Co-Training for Selecting Good Examples from Webly Labeled Video

We tackle the problem of learning concept classifiers from videos on the...

Tag-assisted Multimodal Sentiment Analysis under Uncertain Missing Modalities

Multimodal sentiment analysis has been studied under the assumption that...

Maximum Likelihood Estimation for Multimodal Learning with Missing Modality

Multimodal learning has achieved great successes in many scenarios. Comp...

Clustering-Induced Generative Incomplete Image-Text Clustering (CIGIT-C)

The target of image-text clustering (ITC) is to find correct clusters by...

Balanced Multimodal Learning via On-the-fly Gradient Modulation

Multimodal learning helps to comprehensively understand the world, by in...

Code Repositories