Patch-Token Aligned Bayesian Prompt Learning for Vision-Language Models

03/16/2023
by   Xinyang Liu, et al.
0

For downstream applications of vision-language pre-trained models, there has been significant interest in constructing effective prompts. Existing works on prompt engineering, which either require laborious manual designs or optimize the prompt tuning as a point estimation problem, may fail to describe diverse characteristics of categories and limit their applications. We introduce a Bayesian probabilistic resolution to prompt learning, where the label-specific stochastic prompts are generated hierarchically by first sampling a latent vector from an underlying distribution and then employing a lightweight generative model. Importantly, we semantically regularize prompt learning with the visual knowledge and view images and the corresponding prompts as patch and token sets under optimal transport, which pushes the prompt tokens to faithfully capture the label-specific visual concepts, instead of overfitting the training categories. Moreover, the proposed model can also be straightforwardly extended to the conditional case where the instance-conditional prompts are generated to improve the generalizability. Extensive experiments on 15 datasets show promising transferability and generalization performance of our proposed model.

READ FULL TEXT
research
03/10/2022

Conditional Prompt Learning for Vision-Language Models

With the rise of powerful pre-trained vision-language models like CLIP, ...
research
03/12/2021

Is BERT a Cross-Disciplinary Knowledge Learner? A Surprising Finding of Pre-trained Models' Transferability

In this paper, we investigate whether the power of the models pre-traine...
research
09/02/2021

Learning to Prompt for Vision-Language Models

Vision-language pre-training has recently emerged as a promising alterna...
research
07/07/2022

Not All Models Are Equal: Predicting Model Transferability in a Self-challenging Fisher Space

This paper addresses an important problem of ranking the pre-trained dee...
research
10/05/2022

Variational prompt tuning improves generalization of vision-language models

Prompt tuning provides an efficient mechanism to adapt large vision-lang...
research
01/05/2023

GIVL: Improving Geographical Inclusivity of Vision-Language Models with Pre-Training Methods

A key goal for the advancement of AI is to develop technologies that ser...
research
08/04/2023

Prompt2Gaussia: Uncertain Prompt-learning for Script Event Prediction

Script Event Prediction (SEP) aims to predict the subsequent event for a...

Please sign up or login with your details

Forgot password? Click here to reset