Modality Competition: What Makes Joint Training of Multi-modal Network Fail in Deep Learning? (Provably)

03/23/2022
by   Yu Huang, et al.
0

Despite the remarkable success of deep multi-modal learning in practice, it has not been well-explained in theory. Recently, it has been observed that the best uni-modal network outperforms the jointly trained multi-modal network, which is counter-intuitive since multiple signals generally bring more information. This work provides a theoretical explanation for the emergence of such performance gap in neural networks for the prevalent joint training framework. Based on a simplified data distribution that captures the realistic property of multi-modal data, we prove that for the multi-modal late-fusion network with (smoothed) ReLU activation trained jointly by gradient descent, different modalities will compete with each other. The encoder networks will learn only a subset of modalities. We refer to this phenomenon as modality competition. The losing modalities, which fail to be discovered, are the origins where the sub-optimality of joint training comes from. Experimentally, we illustrate that modality competition matches the intrinsic behavior of late-fusion joint training.

READ FULL TEXT
research
06/21/2021

Improving Multi-Modal Learning with Uni-Modal Teachers

Learning multi-modal representations is an essential step towards real-w...
research
02/10/2022

Characterizing and overcoming the greedy nature of learning in multi-modal deep neural networks

We hypothesize that due to the greedy nature of learning in multi-modal ...
research
06/17/2021

Knowledge distillation from multi-modal to mono-modal segmentation networks

The joint use of multiple imaging modalities for medical image segmentat...
research
05/29/2019

What Makes Training Multi-Modal Networks Hard?

Consider end-to-end training of a multi-modal vs. a single-modal network...
research
06/15/2020

Survey on Deep Multi-modal Data Analytics: Collaboration, Rivalry and Fusion

With the development of web technology, multi-modal or multi-view data h...
research
12/29/2022

MEAformer: Multi-modal Entity Alignment Transformer for Meta Modality Hybrid

As an important variant of entity alignment (EA), multi-modal entity ali...
research
01/29/2019

Deep Neural Networks with Auxiliary-Model Regulated Gating for Resilient Multi-Modal Sensor Fusion

Deep neural networks allow for fusion of high-level features from multip...

Please sign up or login with your details

Forgot password? Click here to reset