Consistency-guided Prompt Learning for Vision-Language Models

06/01/2023
by   Shuvendu Roy, et al.
0

We propose Consistency-guided Prompt learning (CoPrompt), a new fine-tuning method for vision-language models that addresses the challenge of improving the generalization capability of large foundation models while fine-tuning them on downstream tasks in a few-shot setting. The basic idea of CoPrompt is to enforce a consistency constraint in the prediction of the trainable and pre-trained models to prevent overfitting on the downstream task. Additionally, we introduce the following two components into our consistency constraint to further boost the performance: enforcing consistency on two perturbed inputs and combining two dominant paradigms of tuning, prompting and adapter. Enforcing consistency on perturbed input further regularizes the consistency constraint, effectively improving generalization, while tuning additional parameters with prompting and adapters improves the performance on downstream tasks. Extensive experiments show that CoPrompt outperforms existing methods on a range of evaluation suites, including base-to-novel generalization, domain generalization, and cross-dataset evaluation tasks. On the generalization task, CoPrompt improves the state-of-the-art by 2.09 on the harmonic mean over 11 recognition datasets. Detailed ablation studies show the effectiveness of each of the components in CoPrompt.

READ FULL TEXT
research
07/28/2022

Pro-tuning: Unified Prompt Tuning for Vision Tasks

In computer vision, fine-tuning is the de-facto approach to leverage pre...
research
10/05/2022

Variational prompt tuning improves generalization of vision-language models

Prompt tuning provides an efficient mechanism to adapt large vision-lang...
research
08/22/2023

Unsupervised Prototype Adapter for Vision-Language Models

Recently, large-scale pre-trained vision-language models (e.g. CLIP and ...
research
04/13/2022

CAMERO: Consistency Regularized Ensemble of Perturbed Language Models with Weight Sharing

Model ensemble is a popular approach to produce a low-variance and well-...
research
05/23/2023

Skill-Based Few-Shot Selection for In-Context Learning

In-Context learning is the paradigm that adapts large language models to...
research
03/28/2023

Exposing and Addressing Cross-Task Inconsistency in Unified Vision-Language Models

As general purpose vision models get increasingly effective at a wide se...
research
04/30/2023

Reliable Gradient-free and Likelihood-free Prompt Tuning

Due to privacy or commercial constraints, large pre-trained language mod...

Please sign up or login with your details

Forgot password? Click here to reset