MetaVL: Transferring In-Context Learning Ability From Language Models to Vision-Language Models

06/02/2023
by   Masoud Monajatipoor, et al.
0

Large-scale language models have shown the ability to adapt to a new task via conditioning on a few demonstrations (i.e., in-context learning). However, in the vision-language domain, most large-scale pre-trained vision-language (VL) models do not possess the ability to conduct in-context learning. How can we enable in-context learning for VL models? In this paper, we study an interesting hypothesis: can we transfer the in-context learning ability from the language domain to VL domain? Specifically, we first meta-trains a language model to perform in-context learning on NLP tasks (as in MetaICL); then we transfer this model to perform VL tasks by attaching a visual encoder. Our experiments suggest that indeed in-context learning ability can be transferred cross modalities: our model considerably improves the in-context learning capability on VL tasks and can even compensate for the size of the model significantly. On VQA, OK-VQA, and GQA, our method could outperform the baseline model while having 20 times fewer parameters.

READ FULL TEXT

page 4

page 8

page 9

page 10

page 11

research
07/15/2023

SINC: Self-Supervised In-Context Learning for Vision-Language Tasks

Large Pre-trained Transformers exhibit an intriguing capacity for in-con...
research
05/22/2023

Meta-in-context learning in large language models

Large language models have shown tremendous performance in a variety of ...
research
12/18/2022

Rethinking the Role of Scale for In-Context Learning: An Interpretability-based Case Study at 66 Billion Scale

Language models have been shown to perform better with an increase in sc...
research
01/31/2023

What Makes Good Examples for Visual In-Context Learning?

Large-scale models trained on broad data have recently become the mainst...
research
09/09/2023

EPA: Easy Prompt Augmentation on Large Language Models via Multiple Sources and Multiple Targets

Large language models (LLMs) have shown promising performance on various...
research
08/15/2023

RAVEN: In-Context Learning with Retrieval Augmented Encoder-Decoder Language Models

In this paper, we investigate the in-context learning ability of retriev...
research
04/17/2023

Towards Robust Prompts on Vision-Language Models

With the advent of vision-language models (VLMs) that can perform in-con...

Please sign up or login with your details

Forgot password? Click here to reset