MixPHM: Redundancy-Aware Parameter-Efficient Tuning for Low-Resource Visual Question Answering

03/02/2023
by   Jingjing Jiang, et al.
0

Recently, finetuning pretrained vision-language models (VLMs) has become one prevailing paradigm to achieve state-of-the-art performance in VQA. However, as VLMs scale, it becomes computationally expensive, storage inefficient, and prone to overfitting to tune full model parameters for a specific task in low-resource settings. Although current parameter-efficient tuning methods dramatically reduce the number of tunable parameters, there still exists a significant performance gap with full finetuning. In this paper, we propose MixPHM, a redundancy-aware parameter-efficient tuning method that outperforms full finetuning in low-resource VQA. Specifically, MixPHM is a lightweight module implemented by multiple PHM-experts in a mixture-of-experts manner. To reduce parameter redundancy, we reparameterize expert weights in a low-rank subspace and share part of the weights inside and across MixPHM. Moreover, based on our quantitative analysis of representation redundancy, we propose redundancy regularization, which facilitates MixPHM to reduce task-irrelevant redundancy while promoting task-relevant correlation. Experiments conducted on VQA v2, GQA, and OK-VQA with different low-resource settings show that our MixPHM outperforms state-of-the-art parameter-efficient methods and is the only one consistently surpassing full finetuning.

READ FULL TEXT
research
06/08/2021

Compacter: Efficient Low-Rank Hypercomplex Adapter Layers

Adapting large-scale pretrained language models to downstream tasks via ...
research
08/31/2021

LightNER: A Lightweight Generative Framework with Prompt-guided Attention for Low-resource NER

Most existing NER methods rely on extensive labeled data for model train...
research
09/14/2022

Finetuning Pretrained Vision-Language Models with Correlation Information Bottleneck for Robust Visual Question Answering

Benefiting from large-scale Pretrained Vision-Language Models (VL-PMs), ...
research
01/26/2023

Parameter-Efficient Low-Resource Dialogue State Tracking by Prompt Tuning

Dialogue state tracking (DST) is an important step in dialogue managemen...
research
03/10/2023

Open-Ended Medical Visual Question Answering Through Prefix Tuning of Language Models

Medical Visual Question Answering (VQA) is an important challenge, as it...
research
07/07/2022

Meta-Learning the Difference: Preparing Large Language Models for Efficient Adaptation

Large pretrained language models (PLMs) are often domain- or task-adapte...
research
08/07/2023

Redundancy-aware Transformer for Video Question Answering

This paper identifies two kinds of redundancy in the current VideoQA par...

Please sign up or login with your details

Forgot password? Click here to reset