VL-Adapter: Parameter-Efficient Transfer Learning for Vision-and-Language Tasks

12/13/2021
by   Yi-Lin Sung, et al.
2

Recently, fine-tuning language models pre-trained on large text corpora have provided huge improvements on vision-and-language (V L) tasks as well as on pure language tasks. However, fine-tuning the entire parameter set of pre-trained models becomes impractical since the model size is growing rapidly. Hence, in this paper, we introduce adapter-based parameter-efficient transfer learning techniques to V L models such as VL-BART and VL-T5. We evaluate our methods in a unified multi-task setup on four diverse V L tasks: VQAv2, GQA, NLVR2 , and MSCOCO image captioning. With careful training and thorough experiments, we benchmark three popular adapter-based methods (Adapter, Hyperformer, Compacter) against the standard full fine-tuning and the recently proposed prompt-tuning approach. We also enhance the efficiency and performance of adapters by sharing their weights to attain knowledge across tasks. Our results demonstrate that training the adapter with the weight-sharing technique (4.4 model. Lastly, we present a comprehensive analysis including the combination of adapter and task-specific prompts and the impact of V L pre-training on adapters. Our code is available at: https://github.com/ylsung/VL_adapter.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/02/2023

Evaluating Parameter-Efficient Transfer Learning Approaches on SURE Benchmark for Speech Understanding

Fine-tuning is widely used as the default algorithm for transfer learnin...
research
09/18/2023

Parameter-Efficient Long-Tailed Recognition

The "pre-training and fine-tuning" paradigm in addressing long-tailed re...
research
07/31/2023

Revisiting the Parameter Efficiency of Adapters from the Perspective of Precision Redundancy

Current state-of-the-art results in computer vision depend in part on fi...
research
05/12/2023

A Comprehensive Analysis of Adapter Efficiency

Adapters have been positioned as a parameter-efficient fine-tuning (PEFT...
research
08/18/2023

VL-PET: Vision-and-Language Parameter-Efficient Tuning via Granularity Control

As the model size of pre-trained language models (PLMs) grows rapidly, f...
research
03/04/2023

FAME-ViL: Multi-Tasking Vision-Language Model for Heterogeneous Fashion Tasks

In the fashion domain, there exists a variety of vision-and-language (V+...
research
08/17/2022

ILLUME: Rationalizing Vision-Language Models by Interacting with their Jabber

Bootstrapping from pre-trained language models has been proven to be an ...

Please sign up or login with your details

Forgot password? Click here to reset