LION: Implicit Vision Prompt Tuning

03/17/2023
by   Haixin Wang, et al.
0

Despite recent competitive performance across a range of vision tasks, vision Transformers still have an issue of heavy computational costs. Recently, vision prompt learning has provided an economic solution to this problem without fine-tuning the whole large-scale models. However, the efficiency of existing models are still far from satisfactory due to insertion of extensive prompts blocks and trick prompt designs. In this paper, we propose an efficient vision model named impLicit vIsion prOmpt tuNing (LION), which is motivated by deep implicit models with stable memory costs for various complex tasks. In particular, we merely insect two equilibrium implicit layers in two ends of the pre-trained main backbone with parameters in the backbone frozen. Moreover, we prune the parameters in these two layers according to lottery hypothesis. The performance obtained by our LION are promising on a wide range of datasets. In particular, our LION reduces up to 11.5 obtaining higher performance compared with the state-of-the-art baseline VPT, especially under challenging scenes. Furthermore, we find that our proposed LION had a good generalization performance, making it an easy way to boost transfer learning in the future.

READ FULL TEXT
research
05/24/2023

READ: Recurrent Adaptation of Large Transformers

Fine-tuning large-scale Transformers has led to the explosion of many AI...
research
07/16/2023

Revisiting Implicit Models: Sparsity Trade-offs Capability in Weight-tied Model for Vision Tasks

Implicit models such as Deep Equilibrium Models (DEQs) have garnered sig...
research
04/28/2023

Earning Extra Performance from Restrictive Feedbacks

Many machine learning applications encounter a situation where model pro...
research
11/15/2022

FedTune: A Deep Dive into Efficient Federated Fine-Tuning with Pre-trained Transformers

Federated Learning (FL) is an emerging paradigm that enables distributed...
research
12/06/2022

Visual Query Tuning: Towards Effective Usage of Intermediate Representations for Parameter and Memory Efficient Transfer Learning

Intermediate features of a pre-trained model have been shown informative...
research
11/17/2022

Uni-Perceiver v2: A Generalist Model for Large-Scale Vision and Vision-Language Tasks

Despite the remarkable success of foundation models, their task-specific...
research
07/08/2022

VidConv: A modernized 2D ConvNet for Efficient Video Recognition

Since being introduced in 2020, Vision Transformers (ViT) has been stead...

Please sign up or login with your details

Forgot password? Click here to reset