PVP: Pre-trained Visual Parameter-Efficient Tuning

04/26/2023
by   Zhao Song, et al.
0

Large-scale pre-trained transformers have demonstrated remarkable success in various computer vision tasks. However, it is still highly challenging to fully fine-tune these models for downstream tasks due to their high computational and storage costs. Recently, Parameter-Efficient Tuning (PETuning) techniques, e.g., Visual Prompt Tuning (VPT) and Low-Rank Adaptation (LoRA), have significantly reduced the computation and storage cost by inserting lightweight prompt modules into the pre-trained models and tuning these prompt modules with a small number of trainable parameters, while keeping the transformer backbone frozen. Although only a few parameters need to be adjusted, most PETuning methods still require a significant amount of downstream task training data to achieve good results. The performance is inadequate on low-data regimes, especially when there are only one or two examples per class. To this end, we first empirically identify the poor performance is mainly due to the inappropriate way of initializing prompt modules, which has also been verified in the pre-trained language models. Next, we propose a Pre-trained Visual Parameter-efficient (PVP) Tuning framework, which pre-trains the parameter-efficient tuning modules first and then leverages the pre-trained modules along with the pre-trained transformer backbone to perform parameter-efficient tuning on downstream tasks. Experiment results on five Fine-Grained Visual Classification (FGVC) and VTAB-1k datasets demonstrate that our proposed method significantly outperforms state-of-the-art PETuning methods.

READ FULL TEXT

page 1

page 5

research
03/23/2022

Visual Prompt Tuning

The current modus operandi in adapting pre-trained models involves updat...
research
08/21/2022

Scattered or Connected? An Optimized Parameter-efficient Tuning Approach for Information Retrieval

Pre-training and fine-tuning have achieved significant advances in the i...
research
09/17/2023

MVP: Meta Visual Prompt Tuning for Few-Shot Remote Sensing Image Scene Classification

Vision Transformer (ViT) models have recently emerged as powerful and ve...
research
11/29/2022

SPARTAN: Sparse Hierarchical Memory for Parameter-Efficient Transformers

Fine-tuning pre-trained language models (PLMs) achieves impressive perfo...
research
07/14/2022

Convolutional Bypasses Are Better Vision Transformer Adapters

The pretrain-then-finetune paradigm has been widely adopted in computer ...
research
10/05/2021

MoEfication: Conditional Computation of Transformer Models for Efficient Inference

Transformer-based pre-trained language models can achieve superior perfo...
research
07/05/2023

OpenDelta: A Plug-and-play Library for Parameter-efficient Adaptation of Pre-trained Models

The scale of large pre-trained models (PTMs) poses significant challenge...

Please sign up or login with your details

Forgot password? Click here to reset