Understanding and Improving Visual Prompting: A Label-Mapping Perspective

11/21/2022
by   Aochuan Chen, et al.
9

We revisit and advance visual prompting (VP), an input prompting technique for vision tasks. VP can reprogram a fixed, pre-trained source model to accomplish downstream tasks in the target domain by simply incorporating universal prompts (in terms of input perturbation patterns) into downstream data points. Yet, it remains elusive why VP stays effective even given a ruleless label mapping (LM) between the source classes and the target classes. Inspired by the above, we ask: How is LM interrelated with VP? And how to exploit such a relationship to improve its accuracy on target tasks? We peer into the influence of LM on VP and provide an affirmative answer that a better 'quality' of LM (assessed by mapping precision and explanation) can consistently improve the effectiveness of VP. This is in contrast to the prior art where the factor of LM was missing. To optimize LM, we propose a new VP framework, termed ILM-VP (iterative label mapping-based visual prompting), which automatically re-maps the source labels to the target labels and progressively improves the target task accuracy of VP. Further, when using a contrastive language-image pretrained (CLIP) model, we propose to integrate an LM process to assist the text prompt selection of CLIP and to improve the target task accuracy. Extensive experiments demonstrate that our proposal significantly outperforms state-of-the-art VP methods. As highlighted below, we show that when reprogramming an ImageNet-pretrained ResNet-18 to 13 target tasks, our method outperforms baselines by a substantial margin, e.g., 7.9 6.7 CIFAR100 datasets. Besides, our proposal on CLIP-based VP provides 13.7 7.1 available at https://github.com/OPTML-Group/ILM-VP.

READ FULL TEXT

page 1

page 4

page 5

page 7

page 12

page 13

page 15

page 16

research
03/30/2023

Task-Oriented Multi-Modal Mutual Leaning for Vision-Language Models

Prompt learning has become one of the most efficient paradigms for adapt...
research
11/21/2022

Multitask Vision-Language Prompt Tuning

Prompt Tuning, conditioning on task-specific learned prompt vectors, has...
research
10/20/2021

Improving Model Generalization by Agreement of Learned Representations from Data Augmentation

Data augmentation reduces the generalization error by forcing a model to...
research
04/19/2023

NetGPT: Generative Pretrained Transformer for Network Traffic

All data on the Internet are transferred by network traffic, thus accura...
research
08/04/2023

ReCLIP: Refine Contrastive Language Image Pre-Training with Source Free Domain Adaptation

Large-scale Pre-Training Vision-Language Model such as CLIP has demonstr...
research
08/29/2023

Reprogramming under constraints: Revisiting efficient and reliable transferability of lottery tickets

In the era of foundation models with huge pre-training budgets, the down...
research
04/07/2021

Interpreting Verbal Metaphors by Paraphrasing

Metaphorical expressions are difficult linguistic phenomena, challenging...

Please sign up or login with your details

Forgot password? Click here to reset