From Multimodal to Unimodal Attention in Transformers using Knowledge Distillation

10/15/2021
by   Dhruv Agarwal, et al.
0

Multimodal Deep Learning has garnered much interest, and transformers have triggered novel approaches, thanks to the cross-attention mechanism. Here we propose an approach to deal with two key existing challenges: the high computational resource demanded and the issue of missing modalities. We introduce for the first time the concept of knowledge distillation in transformers to use only one modality at inference time. We report a full study analyzing multiple student-teacher configurations, levels at which distillation is applied, and different methodologies. With the best configuration, we improved the state-of-the-art accuracy by 3 parameters by 2.5 times and the inference time by 22 performance-computation tradeoff can be exploited in many applications and we aim at opening a new research area where the deployment of complex models with limited resources is demanded.

READ FULL TEXT

page 3

page 10

research
06/13/2022

The Modality Focusing Hypothesis: On the Blink of Multimodal Knowledge Distillation

Multimodal knowledge distillation (KD) extends traditional knowledge dis...
research
10/09/2022

Students taught by multimodal teachers are superior action recognizers

The focal point of egocentric video understanding is modelling hand-obje...
research
07/14/2023

Multimodal Distillation for Egocentric Action Recognition

The focal point of egocentric video understanding is modelling hand-obje...
research
04/13/2021

Dealing with Missing Modalities in the Visual Question Answer-Difference Prediction Task through Knowledge Distillation

In this work, we address the issues of missing modalities that have aris...
research
07/14/2022

Large-scale Knowledge Distillation with Elastic Heterogeneous Computing Resources

Although more layers and more parameters generally improve the accuracy ...
research
05/27/2023

Vision Transformers for Small Histological Datasets Learned through Knowledge Distillation

Computational Pathology (CPATH) systems have the potential to automate d...
research
10/27/2022

Multimodal Transformer Distillation for Audio-Visual Synchronization

Audio-visual synchronization aims to determine whether the mouth movemen...

Please sign up or login with your details

Forgot password? Click here to reset