Rethinking Visual Prompt Learning as Masked Visual Token Modeling

by   Ning Liao, et al.
HUAWEI Technologies Co., Ltd.
Shanghai Jiao Tong University
Soochow University

Prompt learning has achieved great success in efficiently exploiting large-scale pre-trained models in natural language processing (NLP). It reformulates the downstream tasks as the generative pre-training ones, thus narrowing down the gap between them and improving the performance stably. However, when transferring it to the vision area, current visual prompt learning methods are all designed on discriminative pre-trained models, and there is also a lack of careful design to unify the forms of pre-training and downstream tasks. To explore prompt learning on the generative pre-trained visual model as well as keeping the task consistency, we propose Visual Prompt learning as masked visual Token Modeling (VPTM) to transform the downstream visual classification into the pre-trained masked visual token prediction. In addition, we develop the prototypical verbalizer for mapping the predicted visual token with implicit semantics to explicit downstream labels. To our best knowledge, VPTM is the first visual prompt method on the generative pre-trained visual model, and the first to achieve consistency between pre-training and downstream visual classification by task reformulation. Experiments show that VPTM outperforms other visual prompt methods and achieves excellent efficiency. Moreover, the task consistency of VPTM contributes to the robustness against prompt location, prompt length and prototype dimension, and could be deployed uniformly.


page 1

page 2

page 3

page 4


On the Transferability of Pre-trained Language Models: A Study from Artificial Datasets

Pre-training language models (LMs) on large-scale unlabeled text data ma...

How Adversarial Robustness Transfers from Pre-training to Downstream Tasks

Given the rise of large-scale training regimes, adapting pre-trained mod...

Generative Prompt Tuning for Relation Classification

Using prompts to explore the knowledge contained within pre-trained lang...

DNAGPT: A Generalized Pretrained Tool for Multiple DNA Sequence Analysis Tasks

The success of the GPT series proves that GPT can extract general inform...

Learning Sample Difficulty from Pre-trained Models for Reliable Prediction

Large-scale pre-trained models have achieved remarkable success in a var...

BUS:Efficient and Effective Vision-language Pre-training with Bottom-Up Patch Summarization

Vision Transformer (ViT) based Vision-Language Pre-training (VLP) models...

Revisiting Class-Incremental Learning with Pre-Trained Models: Generalizability and Adaptivity are All You Need

Class-incremental learning (CIL) aims to adapt to emerging new classes w...

Please sign up or login with your details

Forgot password? Click here to reset