CPL: Counterfactual Prompt Learning for Vision and Language Models

10/19/2022
by   Xuehai He, et al.
0

Prompt tuning is a new few-shot transfer learning technique that only tunes the learnable prompt for pre-trained vision and language models such as CLIP. However, existing prompt tuning methods tend to learn spurious or entangled representations, which leads to poor generalization to unseen concepts. Towards non-spurious and efficient prompt learning from limited examples, this paper presents a novel Counterfactual Prompt Learning (CPL) method for vision and language models, which simultaneously employs counterfactual generation and contrastive learning in a joint optimization framework. Particularly, CPL constructs counterfactual by identifying minimal non-spurious feature change between semantically-similar positive and negative samples that causes concept change, and learns more generalizable prompt representation from both factual and counterfactual examples via contrastive learning. Extensive experiments demonstrate that CPL can obtain superior few-shot performance on different vision and language tasks than previous prompt tuning methods on CLIP. On image classification, we achieve 3.55% average relative improvement on unseen classes across seven datasets; on image-text retrieval and visual question answering, we gain up to 4.09% and 25.08% relative improvements across three few-shot scenarios on unseen test sets respectively.

READ FULL TEXT

page 1

page 8

research
05/29/2023

LM-CPPF: Paraphrasing-Guided Data Augmentation for Contrastive Prompt-Based Few-Shot Fine-Tuning

In recent years, there has been significant progress in developing pre-t...
research
08/22/2023

Knowledge-Aware Prompt Tuning for Generalizable Vision-Language Models

Pre-trained vision-language models, e.g., CLIP, working with manually de...
research
03/30/2023

Task-Oriented Multi-Modal Mutual Leaning for Vision-Language Models

Prompt learning has become one of the most efficient paradigms for adapt...
research
08/29/2022

Prompt Tuning with Soft Context Sharing for Vision-Language Models

Vision-language models have recently shown great potential on many compu...
research
04/06/2023

Revisiting Dense Retrieval with Unanswerable Counterfactuals

The retriever-reader framework is popular for open-domain question answe...
research
05/06/2022

KECP: Knowledge Enhanced Contrastive Prompting for Few-shot Extractive Question Answering

Extractive Question Answering (EQA) is one of the most important tasks i...
research
04/20/2020

Learning What Makes a Difference from Counterfactual Examples and Gradient Supervision

One of the primary challenges limiting the applicability of deep learnin...

Please sign up or login with your details

Forgot password? Click here to reset