MAPL: Parameter-Efficient Adaptation of Unimodal Pre-Trained Models for Vision-Language Few-Shot Prompting

10/13/2022
by   Oscar Manas, et al.
10

Large pre-trained models have proved to be remarkable zero- and (prompt-based) few-shot learners in unimodal vision and language tasks. We propose MAPL, a simple and parameter-efficient method that reuses frozen pre-trained unimodal models and leverages their strong generalization capabilities in multimodal vision-language (VL) settings. MAPL learns a lightweight mapping between the representation spaces of unimodal models using aligned image-text data, and can generalize to unseen VL tasks from just a few in-context examples. The small number of trainable parameters makes MAPL effective at low-data and in-domain learning. Moreover, MAPL's modularity enables easy extension to other pre-trained models. Extensive experiments on several visual question answering and image captioning benchmarks show that MAPL achieves superior or competitive performance compared to similar methods while training orders of magnitude fewer parameters. MAPL can be trained in just a few hours using modest computational resources and public datasets. We plan to release the code and pre-trained models.

READ FULL TEXT

page 14

page 16

page 17

page 18

page 19

page 21

page 22

page 23

research
08/02/2021

Pre-trained Models for Sonar Images

Machine learning and neural networks are now ubiquitous in sonar percept...
research
08/21/2023

Towards Accelerated Model Training via Bayesian Data Selection

Mislabeled, duplicated, or biased data in real-world scenarios can lead ...
research
10/20/2022

Composing Ensembles of Pre-trained Models via Iterative Consensus

Large pre-trained models exhibit distinct and complementary capabilities...
research
08/30/2023

CorrEmbed: Evaluating Pre-trained Model Image Similarity Efficacy with a Novel Metric

Detecting visually similar images is a particularly useful attribute to ...
research
10/26/2022

Compressing And Debiasing Vision-Language Pre-Trained Models for Visual Question Answering

Despite the excellent performance of large-scale vision-language pre-tra...
research
04/21/2023

RPLKG: Robust Prompt Learning with Knowledge Graph

Large-scale pre-trained models have been known that they are transferabl...
research
06/22/2022

Independent evaluation of state-of-the-art deep networks for mammography

Deep neural models have shown remarkable performance in image recognitio...

Please sign up or login with your details

Forgot password? Click here to reset