Rethinking Visual Prompt Learning as Masked Visual Token Modeling

03/09/2023
by   Ning Liao, et al.
HUAWEI Technologies Co., Ltd.
Shanghai Jiao Tong University
Soochow University
0

Prompt learning has achieved great success in efficiently exploiting large-scale pre-trained models in natural language processing (NLP). It reformulates the downstream tasks as the generative pre-training ones, thus narrowing down the gap between them and improving the performance stably. However, when transferring it to the vision area, current visual prompt learning methods are all designed on discriminative pre-trained models, and there is also a lack of careful design to unify the forms of pre-training and downstream tasks. To explore prompt learning on the generative pre-trained visual model as well as keeping the task consistency, we propose Visual Prompt learning as masked visual Token Modeling (VPTM) to transform the downstream visual classification into the pre-trained masked visual token prediction. In addition, we develop the prototypical verbalizer for mapping the predicted visual token with implicit semantics to explicit downstream labels. To our best knowledge, VPTM is the first visual prompt method on the generative pre-trained visual model, and the first to achieve consistency between pre-training and downstream visual classification by task reformulation. Experiments show that VPTM outperforms other visual prompt methods and achieves excellent efficiency. Moreover, the task consistency of VPTM contributes to the robustness against prompt location, prompt length and prototype dimension, and could be deployed uniformly.

READ FULL TEXT

page 1

page 2

page 3

page 4

09/08/2021

On the Transferability of Pre-trained Language Models: A Study from Artificial Datasets

Pre-training language models (LMs) on large-scale unlabeled text data ma...
08/07/2022

How Adversarial Robustness Transfers from Pre-training to Downstream Tasks

Given the rise of large-scale training regimes, adapting pre-trained mod...
10/22/2022

Generative Prompt Tuning for Relation Classification

Using prompts to explore the knowledge contained within pre-trained lang...
07/11/2023

DNAGPT: A Generalized Pretrained Tool for Multiple DNA Sequence Analysis Tasks

The success of the GPT series proves that GPT can extract general inform...
04/20/2023

Learning Sample Difficulty from Pre-trained Models for Reliable Prediction

Large-scale pre-trained models have achieved remarkable success in a var...
07/17/2023

BUS:Efficient and Effective Vision-language Pre-training with Bottom-Up Patch Summarization

Vision Transformer (ViT) based Vision-Language Pre-training (VLP) models...
03/13/2023

Revisiting Class-Incremental Learning with Pre-Trained Models: Generalizability and Adaptivity are All You Need

Class-incremental learning (CIL) aims to adapt to emerging new classes w...

Please sign up or login with your details

Forgot password? Click here to reset