A Good Prompt Is Worth Millions of Parameters? Low-resource Prompt-based Learning for Vision-Language Models

10/16/2021
by   Woojeong Jin, et al.
0

Large pretrained vision-language (VL) models can learn a new task with a handful of examples or generalize to a new task without fine-tuning. However, these gigantic VL models are hard to deploy for real-world applications due to their impractically huge model size and slow inference speed. In this work, we propose FewVLM, a few-shot prompt-based learner on vision-language tasks. We pretrain a sequence-to-sequence Transformer model with both prefix language modeling (PrefixLM) and masked language modeling (MaskedLM), and introduce simple prompts to improve zero-shot and few-shot performance on VQA and image captioning. Experimental results on five VQA and captioning datasets show that outperforms Frozen which is 31 times larger than ours by 18.2 point on zero-shot VQAv2 and achieves comparable results to a 246× larger model, PICa. We observe that (1) prompts significantly affect zero-shot performance but marginally affect few-shot performance, (2) MaskedLM helps few-shot VQA tasks while PrefixLM boosts captioning performance, and (3) performance significantly increases when training set size is small.

READ FULL TEXT
research
12/21/2022

From Images to Textual Prompts: Zero-shot VQA with Frozen Large Language Models

Large language models (LLMs) have demonstrated excellent zero-shot gener...
research
03/14/2022

CLIP Models are Few-shot Learners: Empirical Studies on VQA and Visual Entailment

CLIP has shown a remarkable zero-shot capability on a wide range of visi...
research
03/20/2023

eP-ALM: Efficient Perceptual Augmentation of Language Models

Large Language Models (LLMs) have so far impressed the world, with unpre...
research
03/23/2023

Top-Down Visual Attention from Analysis by Synthesis

Current attention algorithms (e.g., self-attention) are stimulus-driven ...
research
08/06/2021

Towards Zero-shot Language Modeling

Can we construct a neural model that is inductively biased towards learn...
research
06/13/2022

Language Models are General-Purpose Interfaces

Foundation models have received much attention due to their effectivenes...
research
03/15/2022

Representation Learning for Resource-Constrained Keyphrase Generation

State-of-the-art keyphrase generation methods generally depend on large ...

Please sign up or login with your details

Forgot password? Click here to reset