Polyhistor: Parameter-Efficient Multi-Task Adaptation for Dense Vision Tasks

10/07/2022
by   Yen-Cheng Liu, et al.
6

Adapting large-scale pretrained models to various downstream tasks via fine-tuning is a standard method in machine learning. Recently, parameter-efficient fine-tuning methods show promise in adapting a pretrained model to different tasks while training only a few parameters. Despite their success, most existing methods are proposed in Natural Language Processing tasks with language Transformers, and adaptation to Computer Vision tasks with Vision Transformers remains under-explored, especially for dense vision tasks. Further, in multi-task settings, individually fine-tuning and storing separate models for different tasks is inefficient. In this work, we provide an extensive multi-task parameter-efficient benchmark and examine existing parameter-efficient fine-tuning NLP methods for vision tasks. Our results on four different dense vision tasks showed that existing methods cannot be efficiently integrated due to the hierarchical nature of the Hierarchical Vision Transformers. To overcome this issue, we propose Polyhistor and Polyhistor-Lite, consisting of Decomposed HyperNetworks and Layer-wise Scaling Kernels, to share information across different tasks with a few trainable parameters. This leads to favorable performance improvements against existing parameter-efficient methods while using fewer trainable parameters. Specifically, Polyhistor achieves competitive accuracy compared to the state-of-the-art while only using  10 Furthermore, our methods show larger performance gains when large networks and more pretraining data are used.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/29/2022

Parameter-efficient Fine-tuning for Vision Transformers

In computer vision, it has achieved great success in adapting large-scal...
research
06/29/2023

An Efficient General-Purpose Modular Vision Model via Multi-Task Heterogeneous Training

We present a model that can perform multiple vision tasks and can be ada...
research
02/24/2022

Learning to Merge Tokens in Vision Transformers

Transformers are widely applied to solve natural language understanding ...
research
12/15/2022

Vision Transformers are Parameter-Efficient Audio-Visual Learners

Vision transformers (ViTs) have achieved impressive results on various c...
research
08/23/2023

Vision Transformer Adapters for Generalizable Multitask Learning

We introduce the first multitasking vision transformer adapters that lea...
research
02/14/2020

HULK: An Energy Efficiency Benchmark Platform for Responsible Natural Language Processing

Computation-intensive pretrained models have been taking the lead of man...
research
05/24/2023

READ: Recurrent Adaptation of Large Transformers

Fine-tuning large-scale Transformers has led to the explosion of many AI...

Please sign up or login with your details

Forgot password? Click here to reset