Distilling Large Vision-Language Model with Out-of-Distribution Generalizability

07/06/2023
by   Xuanlin Li, et al.
0

Large vision-language models have achieved outstanding performance, but their size and computational requirements make their deployment on resource-constrained devices and time-sensitive tasks impractical. Model distillation, the process of creating smaller, faster models that maintain the performance of larger models, is a promising direction towards the solution. This paper investigates the distillation of visual representations in large teacher vision-language models into lightweight student models using a small- or mid-scale dataset. Notably, this study focuses on open-vocabulary out-of-distribution (OOD) generalization, a challenging problem that has been overlooked in previous model distillation literature. We propose two principles from vision and language modality perspectives to enhance student's OOD generalization: (1) by better imitating teacher's visual representation space, and carefully promoting better coherence in vision-language alignment with the teacher; (2) by enriching the teacher's language representations with informative and finegrained semantic attributes to effectively distinguish between different labels. We propose several metrics and conduct extensive experiments to investigate their techniques. The results demonstrate significant improvements in zero-shot and few-shot student performance on open-vocabulary out-of-distribution classification, highlighting the effectiveness of our proposed approaches. Our code will be released at https://github.com/xuanlinli17/large_vlm_distillation_ood

READ FULL TEXT

page 1

page 11

research
05/24/2023

Large Language Model Distillation Doesn't Need a Teacher

Knowledge distillation trains a smaller student model to match the outpu...
research
07/25/2022

Black-box Few-shot Knowledge Distillation

Knowledge distillation (KD) is an efficient approach to transfer the kno...
research
09/25/2019

Extreme Language Model Compression with Optimal Subwords and Shared Projections

Pre-trained deep neural network language models such as ELMo, GPT, BERT ...
research
07/06/2021

VidLanKD: Improving Language Understanding via Video-Distilled Knowledge Transfer

Since visual perception can give rich information beyond text descriptio...
research
12/28/2022

OVO: One-shot Vision Transformer Search with Online distillation

Pure transformers have shown great potential for vision tasks recently. ...
research
02/28/2022

Multi-modal Alignment using Representation Codebook

Aligning signals from different modalities is an important step in visio...
research
06/14/2023

Knowledge Distillation of Large Language Models

Knowledge Distillation (KD) is a promising technique for reducing the hi...

Please sign up or login with your details

Forgot password? Click here to reset