Improving Multi-Modal Learning with Uni-Modal Teachers

06/21/2021
by   Chenzhuang Du, et al.
8

Learning multi-modal representations is an essential step towards real-world robotic applications, and various multi-modal fusion models have been developed for this purpose. However, we observe that existing models, whose objectives are mostly based on joint training, often suffer from learning inferior representations of each modality. We name this problem Modality Failure, and hypothesize that the imbalance of modalities and the implicit bias of common objectives in fusion method prevent encoders of each modality from sufficient feature learning. To this end, we propose a new multi-modal learning method, Uni-Modal Teacher, which combines the fusion objective and uni-modal distillation to tackle the modality failure problem. We show that our method not only drastically improves the representation of each modality, but also improves the overall multi-modal task performance. Our method can be effectively generalized to most multi-modal fusion approaches. We achieve more than 3 as improving performance on the NYU depth V2 RGB-D image segmentation task.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/02/2023

On Uni-Modal Feature Learning in Supervised Multi-Modal Learning

We abstract the features (i.e. learned representations) of multi-modal d...
research
03/23/2022

Modality Competition: What Makes Joint Training of Multi-modal Network Fail in Deep Learning? (Provably)

Despite the remarkable success of deep multi-modal learning in practice,...
research
02/16/2023

NUAA-QMUL-AIIT at Memotion 3: Multi-modal Fusion with Squeeze-and-Excitation for Internet Meme Emotion Analysis

This paper describes the participation of our NUAA-QMUL-AIIT team in the...
research
07/17/2018

Robust Deep Multi-modal Learning Based on Gated Information Fusion Network

The goal of multi-modal learning is to use complimentary information on ...
research
02/19/2022

Multi-Modal Recurrent Fusion for Indoor Localization

This paper considers indoor localization using multi-modal wireless sign...
research
08/23/2022

DeepInteraction: 3D Object Detection via Modality Interaction

Existing top-performance 3D object detectors typically rely on the multi...
research
06/07/2019

Multi-modal Active Learning From Human Data: A Deep Reinforcement Learning Approach

Human behavior expression and experience are inherently multi-modal, and...

Please sign up or login with your details

Forgot password? Click here to reset