E^2VPT: An Effective and Efficient Approach for Visual Prompt Tuning

07/25/2023
by   Cheng Han, et al.
0

As the size of transformer-based models continues to grow, fine-tuning these large-scale pretrained vision models for new tasks has become increasingly parameter-intensive. Parameter-efficient learning has been developed to reduce the number of tunable parameters during fine-tuning. Although these methods show promising results, there is still a significant performance gap compared to full fine-tuning. To address this challenge, we propose an Effective and Efficient Visual Prompt Tuning (E^2VPT) approach for large-scale transformer-based model adaptation. Specifically, we introduce a set of learnable key-value prompts and visual prompts into self-attention and input layers, respectively, to improve the effectiveness of model fine-tuning. Moreover, we design a prompt pruning procedure to systematically prune low importance prompts while preserving model performance, which largely enhances the model's efficiency. Empirical results demonstrate that our approach outperforms several state-of-the-art baselines on two benchmarks, with considerably low parameter usage (e.g., 0.32 Our code is available at https://github.com/ChengHan111/E2VPT.

READ FULL TEXT

page 4

page 8

research
03/29/2022

Parameter-efficient Fine-tuning for Vision Transformers

In computer vision, it has achieved great success in adapting large-scal...
research
05/26/2023

Do We Really Need a Large Number of Visual Prompts?

Due to increasing interest in adapting models on resource-constrained ed...
research
10/09/2022

SparseAdapter: An Easy Approach for Improving the Parameter-Efficiency of Adapters

Adapter Tuning, which freezes the pretrained language models (PLMs) and ...
research
04/17/2023

Hyper-Decision Transformer for Efficient Online Policy Adaptation

Decision Transformers (DT) have demonstrated strong performances in offl...
research
09/11/2023

Pushing Mixture of Experts to the Limit: Extremely Parameter Efficient MoE for Instruction Tuning

The Mixture of Experts (MoE) is a widely known neural architecture where...
research
06/20/2023

Augmenting Sub-model to Improve Main Model

Image classification has improved with the development of training techn...
research
08/23/2023

Vision Transformer Adapters for Generalizable Multitask Learning

We introduce the first multitasking vision transformer adapters that lea...

Please sign up or login with your details

Forgot password? Click here to reset