Unleashing the Power of Visual Prompting At the Pixel Level

12/20/2022
by   Junyang Wu, et al.
0

This paper presents a simple and effective visual prompting method for adapting pre-trained models to downstream recognition tasks. Our method includes two key designs. First, rather than directly adding together the prompt and the image, we treat the prompt as an extra and independent learnable component. We show that the strategy of reconciling the prompt and the image matters, and find that warping the prompt around a properly shrinked image empirically works the best. Second, we re-introduce two "old tricks" commonly used in building transferable adversarial examples, i.e., input diversity and gradient normalization, into visual prompting. These techniques improve optimization and enable the prompt to generalize better. We provide extensive experimental results to demonstrate the effectiveness of our method. Using a CLIP model, our prompting method sets a new record of 82.8 across 12 popular classification datasets, substantially surpassing the prior art by +5.6 outperforms linear probing by +2.1 certain datasets. In addition, our prompting method shows competitive performance across different data scales and against distribution shifts. The code is publicly available at https://github.com/UCSC-VLAA/EVP.

READ FULL TEXT

page 3

page 6

research
03/14/2023

Diversity-Aware Meta Visual Prompting

We present Diversity-Aware Meta Visual Prompting (DAM-VP), an efficient ...
research
03/23/2023

Exploring Visual Prompts for Whole Slide Image Classification with Multiple Instance Learning

Multiple instance learning (MIL) has emerged as a popular method for cla...
research
03/22/2021

Adversarial Feature Augmentation and Normalization for Visual Recognition

Recent advances in computer vision take advantage of adversarial data au...
research
11/21/2019

Adversarial Examples Improve Image Recognition

Adversarial examples are commonly viewed as a threat to ConvNets. Here w...
research
04/30/2022

SVTR: Scene Text Recognition with a Single Visual Model

Dominant scene text recognition models commonly contain two building blo...
research
04/21/2022

Fast AdvProp

Adversarial Propagation (AdvProp) is an effective way to improve recogni...
research
05/07/2023

AdaptiveClick: Clicks-aware Transformer with Adaptive Focal Loss for Interactive Image Segmentation

Interactive Image Segmentation (IIS) has emerged as a promising techniqu...

Please sign up or login with your details

Forgot password? Click here to reset