Enhancing CLIP with CLIP: Exploring Pseudolabeling for Limited-Label Prompt Tuning

06/02/2023
by   Cristina Menghini, et al.
0

Fine-tuning vision-language models (VLMs) like CLIP to downstream tasks is often necessary to optimize their performance. However, a major obstacle is the limited availability of labeled data. We study the use of pseudolabels, i.e., heuristic labels for unlabeled data, to enhance CLIP via prompt tuning. Conventional pseudolabeling trains a model on labeled data and then generates labels for unlabeled data. VLMs' zero-shot capabilities enable a “second generation” of pseudolabeling approaches that do not require task-specific training on labeled data. By using zero-shot pseudolabels as a source of supervision, we observe that learning paradigms such as semi-supervised, transductive zero-shot, and unsupervised learning can all be seen as optimizing the same loss function. This unified view enables the development of versatile training strategies that are applicable across learning paradigms. We investigate them on image classification tasks where CLIP exhibits limitations, by varying prompt modalities, e.g., textual or visual prompts, and learning paradigms. We find that (1) unexplored prompt tuning strategies that iteratively refine pseudolabels consistently improve CLIP accuracy, by 19.5 points in semi-supervised learning, by 28.4 points in transductive zero-shot learning, and by 15.2 points in unsupervised learning, and (2) unlike conventional semi-supervised pseudolabeling, which exacerbates model biases toward classes with higher-quality pseudolabels, prompt tuning leads to a more equitable distribution of per-class accuracy. The code to reproduce the experiments is at github.com/BatsResearch/menghini-enhanceCLIPwithCLIP-code.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/07/2022

Masked Unsupervised Self-training for Zero-shot Image Classification

State-of-the-art computer vision models are mostly trained with supervis...
research
05/24/2023

An Unsupervised Method for Estimating Class Separability of Datasets with Application to LLMs Fine-Tuning

This paper proposes an unsupervised method that leverages topological ch...
research
10/17/2022

Learning Instructions with Unlabeled Data for Zero-Shot Cross-Task Generalization

Training language models to learn from human instructions for zero-shot ...
research
07/14/2023

Improving Zero-Shot Generalization for CLIP with Synthesized Prompts

With the growing interest in pretrained vision-language models like CLIP...
research
05/17/2023

CLIP-GCD: Simple Language Guided Generalized Category Discovery

Generalized Category Discovery (GCD) requires a model to both classify k...
research
04/24/2016

Semi-supervised Vocabulary-informed Learning

Despite significant progress in object categorization, in recent years, ...
research
12/20/2022

Toward Human Readable Prompt Tuning: Kubrick's The Shining is a good movie, and a good prompt too?

Large language models can perform new tasks in a zero-shot fashion, give...

Please sign up or login with your details

Forgot password? Click here to reset