ReCLIP: Refine Contrastive Language Image Pre-Training with Source Free Domain Adaptation

08/04/2023
by   Hu. Xuefeng, et al.
0

Large-scale Pre-Training Vision-Language Model such as CLIP has demonstrated outstanding performance in zero-shot classification, e.g. achieving 76.3 accuracy on ImageNet without seeing any example, which leads to potential benefits to many tasks that have no labeled data. However, while applying CLIP to a downstream target domain, the presence of visual and text domain gaps and cross-modality misalignment can greatly impact the model performance. To address such challenges, we propose ReCLIP, the first source-free domain adaptation method for vision-language models, which does not require any source data or target labeled data. ReCLIP first learns a projection space to mitigate the misaligned visual-text embeddings and learns pseudo labels, and then deploys cross-modality self-training with the pseudo labels, to update visual and text encoders, refine labels and reduce domain gaps and misalignments iteratively. With extensive experiments, we demonstrate ReCLIP reduces the average error rate of CLIP from 30.17 benchmarks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/30/2023

Open-Set Domain Adaptation with Visual-Language Foundation Models

Unsupervised domain adaptation (UDA) has proven to be very effective in ...
research
09/24/2020

Feature Adaptation of Pre-Trained Language Models across Languages and Domains for Text Classification

Adapting pre-trained language models (PrLMs) (e.g., BERT) to new domains...
research
08/13/2023

Unsupervised Adaptation of Polyp Segmentation Models via Coarse-to-Fine Self-Supervision

Unsupervised Domain Adaptation (UDA) has attracted a surge of interest o...
research
12/12/2022

SRoUDA: Meta Self-training for Robust Unsupervised Domain Adaptation

As acquiring manual labels on data could be costly, unsupervised domain ...
research
08/08/2023

Unsupervised Camouflaged Object Segmentation as Domain Adaptation

Deep learning for unsupervised image segmentation remains challenging du...
research
04/04/2023

Black Box Few-Shot Adaptation for Vision-Language models

Vision-Language (V-L) models trained with contrastive learning to align ...
research
11/21/2022

Understanding and Improving Visual Prompting: A Label-Mapping Perspective

We revisit and advance visual prompting (VP), an input prompting techniq...

Please sign up or login with your details

Forgot password? Click here to reset