Improving CLIP Robustness with Knowledge Distillation and Self-Training

09/19/2023
by   Clement Laroudie, et al.
0

This paper examines the robustness of a multi-modal computer vision model, CLIP (Contrastive Language-Image Pretraining), in the context of unsupervised learning. The main objective is twofold: first, to evaluate the robustness of CLIP, and second, to explore strategies for augmenting its robustness. To achieve this, we introduce a novel approach named LP-CLIP. This technique involves the distillation of CLIP features through the incorporation of a linear probing layer positioned atop its encoding structure. This newly added layer is trained utilizing pseudo-labels produced by CLIP, coupled with a self-training strategy. The LP-CLIP technique offers a promising approach to enhance the robustness of CLIP without the need for annotations. By leveraging a simple linear probing layer, we aim to improve the model's ability to withstand various uncertainties and challenges commonly encountered in real-world scenarios. Importantly, our approach does not rely on annotated data, which makes it particularly valuable in situations where labeled data might be scarce or costly to obtain. Our proposed approach increases the robustness of CLIP with SOTA results compared to supervised technique on various datasets.

READ FULL TEXT
research
09/22/2021

KD-VLP: Improving End-to-End Vision-and-Language Pretraining with Object Knowledge Distillation

Self-supervised vision-and-language pretraining (VLP) aims to learn tran...
research
08/25/2022

MaskCLIP: Masked Self-Distillation Advances Contrastive Language-Image Pretraining

This paper presents a simple yet effective framework MaskCLIP, which inc...
research
04/10/2022

Robust Cross-Modal Representation Learning with Progressive Self-Distillation

The learning objective of vision-language approach of CLIP does not effe...
research
11/01/2021

When Does Contrastive Learning Preserve Adversarial Robustness from Pretraining to Finetuning?

Contrastive learning (CL) can learn generalizable feature representation...
research
06/11/2023

Are Intermediate Layers and Labels Really Necessary? A General Language Model Distillation Method

The large scale of pre-trained language models poses a challenge for the...
research
02/24/2023

A Knowledge Distillation framework for Multi-Organ Segmentation of Medaka Fish in Tomographic Image

Morphological atlases are an important tool in organismal studies, and m...

Please sign up or login with your details

Forgot password? Click here to reset