Dynamic Visual Prompt Tuning for Parameter Efficient Transfer Learning

09/12/2023
by   Chunqing Ruan, et al.
0

Parameter efficient transfer learning (PETL) is an emerging research spot that aims to adapt large-scale pre-trained models to downstream tasks. Recent advances have achieved great success in saving storage and computation costs. However, these methods do not take into account instance-specific visual clues for visual tasks. In this paper, we propose a Dynamic Visual Prompt Tuning framework (DVPT), which can generate a dynamic instance-wise token for each image. In this way, it can capture the unique visual feature of each image, which can be more suitable for downstream visual tasks. We designed a Meta-Net module that can generate learnable prompts based on each image, thereby capturing dynamic instance-wise visual features. Extensive experiments on a wide range of downstream recognition tasks show that DVPT achieves superior performance than other PETL methods. More importantly, DVPT even outperforms full fine-tuning on 17 out of 19 downstream tasks while maintaining high parameter efficiency. Our code will be released soon.

READ FULL TEXT

page 4

page 8

page 13

research
02/16/2023

Towards Efficient Visual Adaption via Structural Re-parameterization

Parameter-efficient transfer learning (PETL) is an emerging research spo...
research
06/27/2023

Approximated Prompt Tuning for Vision-Language Pre-trained Models

Prompt tuning is a parameter-efficient way to deploy large-scale pre-tra...
research
03/26/2023

BlackVIP: Black-Box Visual Prompting for Robust Transfer Learning

With the surge of large-scale pre-trained models (PTMs), fine-tuning the...
research
07/27/2019

Learnable Parameter Similarity

Most of the existing approaches focus on specific visual tasks while ign...
research
07/25/2023

Benchmarking and Analyzing Generative Data for Visual Recognition

Advancements in large pre-trained generative models have expanded their ...
research
07/29/2023

Instance-Wise Adaptive Tuning and Caching for Vision-Language Models

Large-scale vision-language models (LVLMs) pretrained on massive image-t...
research
12/06/2022

Visual Query Tuning: Towards Effective Usage of Intermediate Representations for Parameter and Memory Efficient Transfer Learning

Intermediate features of a pre-trained model have been shown informative...

Please sign up or login with your details

Forgot password? Click here to reset