Modality-specific Distillation

01/06/2021
by   Woojeong Jin, et al.
0

Large neural networks are impractical to deploy on mobile devices due to their heavy computational cost and slow inference. Knowledge distillation (KD) is a technique to reduce the model size while retaining performance by transferring knowledge from a large "teacher" model to a smaller "student" model. However, KD on multimodal datasets such as vision-language datasets is relatively unexplored and digesting such multimodal information is challenging since different modalities present different types of information. In this paper, we propose modality-specific distillation (MSD) to effectively transfer knowledge from a teacher on multimodal datasets. Existing KD approaches can be applied to multimodal setup, but a student doesn't have access to modality-specific predictions. Our idea aims at mimicking a teacher's modality-specific predictions by introducing an auxiliary loss term for each modality. Because each modality has different importance for predictions, we also propose weighting approaches for the auxiliary losses; a meta-learning approach to learn the optimal weights on these loss terms. In our experiments, we demonstrate the effectiveness of our MSD and the weighting scheme and show that it achieves better performance than KD.

READ FULL TEXT
research
12/21/2021

Multi-Modality Distillation via Learning the teacher's modality-level Gram Matrix

In the context of multi-modality knowledge distillation research, the ex...
research
08/06/2023

Semantic-Guided Feature Distillation for Multimodal Recommendation

Multimodal recommendation exploits the rich multimodal information assoc...
research
06/13/2022

The Modality Focusing Hypothesis: On the Blink of Multimodal Knowledge Distillation

Multimodal knowledge distillation (KD) extends traditional knowledge dis...
research
03/24/2023

Decoupled Multimodal Distilling for Emotion Recognition

Human multimodal emotion recognition (MER) aims to perceive human emotio...
research
08/17/2022

Leukocyte Classification using Multimodal Architecture Enhanced by Knowledge Distillation

Recently, a lot of automated white blood cells (WBC) or leukocyte classi...
research
09/06/2018

RDPD: Rich Data Helps Poor Data via Imitation

In many situations, we have both rich- and poor- data environments: in a...
research
04/17/2023

MMANet: Margin-aware Distillation and Modality-aware Regularization for Incomplete Multimodal Learning

Multimodal learning has shown great potentials in numerous scenes and at...

Please sign up or login with your details

Forgot password? Click here to reset